Endogenous retroviruses up-regulated in prostate cancer

ABSTRACT

Human endogenous retroviruses of the HML-2 family show up-regulated expression in prostate tumors. This finding can be used in prostate cancer screening, diagnosis and therapy.

All documents cited herein are incorporated by reference in theirentirety.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. application Ser. No.10/016,604 filed Dec. 7, 2001, now allowed, which claims the benefit ofpriority of U.S. Provisional Patent Application No. 60/251,830, filedDec. 7, 2000. Each of these applications is incorporated herein byreference in its entirety.

TECHNICAL FIELD

The present invention relates to the diagnosis of cancer, particularlyprostate cancer. In particular, it relates to a subgroup of humanendogenous retroviruses (HERVs) which show up-regulated expression intumors, particularly prostate tumors.

BACKGROUND ART

Prostate cancer is the most common type of cancer in men in the USA.Benign prostatic hyperplasia (BPH) is the abnormal growth of benignprostate cells in which the prostate grows and pushes against theurethra and bladder, blocking the normal flow of urine. More than halfof the men in the USA between the ages of 60 and 70 and as many as 90percent between the ages of 70 and 90 have symptoms of BPH. Althoughthis condition is seldom a threat to life, it may require treatment torelieve symptoms.

Cancer that begins in the prostate is called primary prostate cancer (orprostatic cancer). Prostate cancer may remain in the prostate gland, orit may spread to nearby lymph nodes and may also spread to the bones,bladder, rectum, and other organs. Prostate cancer is diagnosed bymeasuring the levels of prostate-specific antigen (PSA) and prostaticacid phosphatase (PAP) in the blood. The level of PSA in blood may risein men who have prostate cancer, BPH, or an infection in the prostate.The level of PAP rises above normal in many prostate cancer patients,especially if the cancer has spread beyond the prostate. However, onecannot diagnose prostate cancer with these tests alone because elevatedPSA or PAP levels may also indicate other, non-cancerous problems.

In order to help determine whether conditions of the prostate are benignor malignant further tests such as transrectal ultrasonography,intravenous pyelogram, and cystoscopy are usually performed. If thesetest results suggest that cancer may be present, the patient mustundergo a biopsy as the only sure way to diagnose prostate cancer.Consequently, it is desirable to provide a simple and direct test forthe early detection and diagnosis of prostate cancer without having toundergo multiple rounds of cumbersome testing procedures. It is alsodesirable and necessary to provide compositions and methods for theprevention and/or treatment of prostate cancer.

It is an object of the invention to provide materials that can be usedin the prevention, treatment and diagnosis of prostate cancer. It is afurther object to provide improvements in the prevention, treatment anddiagnosis of prostate cancer.

DISCLOSURE OF THE INVENTION

It has been found that human endogenous retroviruses (HERVs) of theHML-2 subgroup of the HERV-K family show up-regulated expression inprostate tumors. This finding can be used in prostate cancer screening,diagnosis and therapy.

The invention provides a method for diagnosing cancer, especiallyprostate cancer, the method comprising the step of detecting thepresence or absence of an expression product of a HML-2 endogenousretrovirus in a patient sample. Higher levels of expression productrelative to normal tissue indicate that the patient from whom the samplewas taken has cancer.

The HML-2 expression product which is detected is either a mRNAtranscript or a polypeptide translated from such a transcript. Theseexpression products may be detected directly or indirectly. A directtest uses an assay which detects HML-2 RNA or polypeptide in a patientsample. An indirect test uses an assay which detects biomolecules whichare not directly expressed in vivo from HML-2 e.g. an assay to detectcDNA which has been reverse-transcribed from a HML-2 mRNA, or an assayto detect an antibody which has been raised in response to a HML-2polypeptide.

A—The Patient Sample

Where the diagnostic method of the invention is based on HML-2 mRNA, thepatient sample will generally comprise cells, preferably, prostatecells. These may be present in a sample of tissue, preferably, prostatetissue, or may be cells, preferably, prostate cells which have escapedinto circulation (e.g. during metastasis). Instead of or as well ascomprising prostate cells, the sample may comprise virions which containmRNA from HML-2.

Where the diagnostic method of the invention is based on Hml-2polypeptides, the patient sample may comprise cells, preferably,prostate cells and/or virions (as described above for mRNA), or maycomprise antibodies which recognize HML-2 polypeptides. Such antibodieswill typically be present in circulation.

In general, therefore, the patient sample is tissue sample (e.g. abiopsy), preferably, a prostate sample (e.g. a biopsy) or a bloodsample.

The patient is generally a human, preferably human male, and morepreferably an adult human male.

Expression products may be detected in the patient sample itself, or itmay be detected in material derived from the sample (e.g. thesupernatant of a cell lysate, or a RNA extract, or cDNA generated from aRNA extract, or polypeptides translated from a RNA extract, or cellsderived from culture of cells extracted from a patient etc.). These arestill considered to be “patient samples” within the meaning of theinvention.

Methods of the invention can be conducted in vitro or in vivo.

Other possible sources of patient samples include isolated cells, wholetissues, or bodily fluids (e.g. blood, plasma, serum, urine, pleuraleffusions, cerebro-spinal fluid, etc.)

B—The mRNA Expression Product

Where the diagnostic method of the invention is based on mRNA detection,it typically involves detecting a RNA comprising six basic regions. From5′ to 3′, these are:

1. A sequence which has at least 75% identity to SEQ ID NO:155 (e.g.76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%,90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.9%, 100%identity); or a sequence which has at least 50% identity to SEQ IDNO:155 (e.g. 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%,63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%,77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%,91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.9%, 100%identity) and is expressed at least 1.5 fold (e.g. 2, 2.5, 5, 10, 20,50, etc., fold) higher level relative to expression in a normal (i.e.,non cancerous) cell with at least a 95% confidence level; or a sequencewhich has at least 80% identity (e.g. 81%, 82%, 83%, 84%, 85%, 86%, 87%,88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%,99.9%, 100% identity) to at least a 20 contiguous nucleotide fragment(e.g. 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100,110, 115, 120, 125, 130, 135, 140, 145, etc., contiguous nucleotides) ofSEQ ID NO:155; or a sequence which has at least 80% identity (e.g. 81%,82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%,96%, 97%, 98%, 99%, 99.5%, 99.9%, 100% identity) to at least a 20contiguous nucleotide fragment (e.g. 25, 30, 35, 40, 45, 50, 55, 60, 65,70, 75, 80, 85, 90, 95, 100, 110, 115, 120, 125, 130, 135, 140, 145,etc., contiguous nucleotides) of SEQ ID NO:155 and is expressed at least1.5 fold (e.g. 2, 2.5, 5, 10, 20, 50, etc., fold) higher level relativeto expression in a normal (i.e., non cancerous) cell with at least a 95%confidence level. This sequence will typically be at the 5′ end of theRNA. SEQ ID NO:155 is the nucleotide sequence of the start of R regionin the LTR of the ‘ERVK6’ HML-2 virus [ref 1]. This portion of the Rregion is found in all full-length HML-2 transcripts.

2. A downstream region comprising a sequence which has at least 75%sequence identity to SEQ ID NO:156 (e.g. 76%, 77%, 78%, 79%, 80%, 81%,82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%,96%, 97%, 98%, 99%, 99.5%, 99.9%, 100% identity); or a sequence whichhas at least 50% identity to SEQ ID NO:156 (e.g. 51%, 52%, 53%, 54%,55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%,69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%,83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%,97%, 98%, 99%, 99.5%, 99.9%, 100% identity) and is expressed at least1.5 fold (e.g. 2, 2.5, 5, 10, 20, 50, etc., fold) higher level relativeto expression in a normal (i.e., non cancerous) cell with at least a 95%confidence level; or a sequence which has at least 80% identity (e.g.81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%,95%, 96%, 97%, 98%, 99%, 99.5%, 99.9%, 100% identity) to at least a 20contiguous nucleotide fragment (e.g. 25, 30, 35, 40, 45, 50, 55, 60, 65,70, 75, 80, 85, 90, 95, 100, 110, 115, 120, 125, 130, 135, 140, 145,150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215,220, 225, 230, 235, 240, 245, 250, 255, etc., contiguous nucleotides) ofSEQ ID NO:156; or a sequence which has at least 80% identity (e.g. 81%,82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%,96%, 97%, 98%, 99%, 99.5%, 99.9%, 100% identity) to at least a 20contiguous nucleotide fragment (e.g. 25, 30, 35, 40, 45, 50, 55, 60, 65,70, 75, 80, 85, 90, 95, 100, 110, 115, 120, 125, 130, 135, 140, 145,150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215,220, 225, 230, 235, 240, 245, 250, 255, etc., contiguous nucleotides) ofSEQ ID NO:156 and is expressed at least 1.5 fold (e.g. 2, 2.5, 5, 10,20, 50, etc., fold) higher level relative to expression in a normal(i.e., non cancerous) cell with at least a 95% confidence level. SEQ IDNO:156 is the nucleotide sequence of the RU5 region downstream of SEQ IDNO:155 in the ERVK6 LTR. This region is found in full-length HML-2transcripts, but may not be present in all mRNAs transcribed from aHML-2 LTR promoter.

3. A downstream region comprising a sequence which has at least 75%sequence identity to SEQ ID NO:6 (e.g. 76%, 77%, 78%, 79%, 80%, 81%,82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%,96%, 97%, 98%, 99%, 99.5%, 99.9%, 100% identity); or a sequence whichhas at least 50% identity to SEQ ID NO:6 (e.g. 51%, 52%, 53%, 54%, 55%,56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%,70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%,84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,98%, 99%, 99.5%, 99.9%, 100% identity) and is expressed at least 1.5fold (e.g. 2, 2.5, 5, 10, 20, 50, etc., fold) higher level relative toexpression in a normal (i.e., non cancerous) cell with at least a 95%confidence level; or a sequence which has at least 80% identity (e.g.81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%,95%, 96%, 97%, 98%, 99%, 99.5%, 99.9%, 100% identity) to at least a 20contiguous nucleotide fragment (e.g. 25, 30, 35, 40, 45, 50, 55, 60, 65,70, 75, 80, 85, 90, 95, 100, etc., contiguous nucleotides) of SEQ IDNO:6; or a sequence which has at least 80% identity (e.g. 81%, 82%, 83%,84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,98%, 99%, 99.5%, 99.9%, 100% identity) to at least a 20 contiguousnucleotide fragment (e.g. 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75,80, 85, 90, 95, 100, etc., contiguous nucleotides) of SEQ ID NO:6 and isexpressed at least 1.5 fold (e.g. 2, 2.5, 5, 10, 20, 50, etc., fold)higher level relative to expression in a normal (i.e., non cancerous)cell with at least a 95% confidence level. SEQ ID NO:6 is the nucleotidesequence of the region of the ERVK6 virus between the U5 region and thefirst 5′ splice site. This region is found in full-length HML-2transcripts, but has been lost by some variants and, like region 2above, may not be present in all mRNAs transcribed from a HML-2 LTRpromoter.

4. A downstream region comprising any RNA sequence. This region willtypically comprise the coding sequence of one or more HML-2polypeptides, but may alternatively comprise: a mutant viral codingsequence; a viral or non-viral non-coding sequence; or a non-viralcoding sequence. Transcription of any of these sequences can come underthe control of a HML-2 LTR.

5. A downstream region comprising a sequence which has at least 75%sequence identity to SEQ ID NO:5 (e.g. 76%, 77%, 78%, 79%, 80%, 81%,82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%,96%, 97%, 98%, 99%, 99.5%, 99.9%, 100% identity); or a sequence whichhas at least 50% identity to SEQ ID NO:5 (e.g. 51%, 52%, 53%, 54%, 55%,56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%,70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%,84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,98%, 99%, 99.5%, 99.9%, 100% identity) and is expressed at least 1.5fold (e.g. 2, 2.5, 5, 10, 20, 50, etc., fold) higher level relative toexpression in a normal (i.e., non cancerous) cell with at least a 95%confidence level; or a sequence which has at least 80% identity (e.g.81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%,95%, 96%, 97%, 98%, 99%, 99.5%, 99.9%, 100% identity) to at least a 20contiguous nucleotide fragment (e.g. 25, 30, 35, 40, 45, 50, 55, 60, 65,70, 75, 80, 85, 90, 95, 100, 110, 115, 120, 125, 130, 135, 140, 145,150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215,220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285,290, 295, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, etc.,contiguous nucleotides) of SEQ ID NO:5; or a sequence which has at least80% identity (e.g. 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%,91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.9%, 100%identity) to at least a 20 contiguous nucleotide fragment (e.g. 25, 30,35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 115, 120,125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190,195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260,265, 270, 275, 280, 285, 290, 295, 300, 350, 400, 450, 500, 550, 600,650, 700, 750, 800, etc., contiguous nucleotides) of SEQ ID NO:5 and isexpressed at least 1.5 fold (e.g. 2, 2.5, 5, 10, 20, 50, etc., fold)higher level relative to expression in a normal (i.e., non cancerous)cell with at least a 95% confidence level. SEQ ID NO:5 is the nucleotidesequence of the U3R region in the 3′ end of ERVK6. This sequence willtypically be near the 3′ end of the RNA, immediately preceding any polyAtail.

6. A 3′ polyA tail.

The percent identity of the sequences described above are determined bythe Smith-Waterman algorithm using the default parameters: open gappenalty=−20 and extension penalty=−5.

These mRNA molecules are referred to below as “PCA-mRNA” molecules(“prostate cancer associated mRNA”), and endogenous viruses whichexpress these PCA-mRNAs are referred to as PCAVs (“prostate cancerassociated viruses”). Nevertheless, said PCAVs may also be associatedwith other types of cancer.

Although some PCA-mRNAs include all six of these regions, most HERVs aredefective in that they have accumulated multiple stop codons,frameshifts, or larger deletions etc. This means that many PCA-mRNAs donot include all six regions. As all PCA-mRNAs are transcribed under thecontrol of the same group of LTRs, however, transcription of allPCA-mRNAs is up-regulated in prostate tumors even though the mRNA maynot encode functional polypeptides.

Where a mRNA to be detected is driven by 5′ LTR of HML-2 in genomic DNA,the first of these regions will always be present, but the remainingfive are optional. Conversely, where a mRNA to be detected is controlledby 3′ LTR of HML-2, the fifth of these regions will always be present,but the remaining five are optional.

In general, therefore, the mRNA to be detected has the formulaN₁—N₂—N₃—N₄N₅—polyA, wherein:

-   -   N1 has at least 75% sequence identity to SEQ ID NO:155; or has        at least 50% identity to SEQ ID NO:155 and is expressed at least        1.5 fold higher relative to expression in a normal (i.e., non        cancerous) cell with at least a 95% confidence level; or has at        least 80% identity to at least a 20 contiguous nucleotide        fragment of SEQ ID NO:155; or has at least 80% identity to at        least a 20 contiguous nucleotide fragment of SEQ ID NO:155 and        is expressed at least 1.5 fold higher relative to expression in        a normal (i.e., non cancerous) cell with at least a 95%        confidence level;    -   N2 has at least 75% sequence identity to SEQ ID NO:156; or has        at least 50% identity to SEQ ID NO:156 and is expressed at least        1.5 fold higher relative to expression in a normal (i.e., non        cancerous) cell with at least a 95% confidence level; or has at        least 80% identity to SEQ ID NO:156 and is expressed at least        1.5 fold higher relative to expression in a normal (i.e., non        cancerous) cell with at least a 95% confidence level; or has at        least 80% identity to at least a 20 contiguous nucleotide        fragment of SEQ ID NO:156; or has at least 80% identity to at        least a 20 contiguous nucleotide fragment of SEQ ID NO:156 and        is expressed at least 1.5 fold higher relative to expression in        a normal (i.e., non cancerous) cell with at least a 95%        confidence level;    -   N3 has at least 75% sequence identity to SEQ ID NO:6; or has at        least 50% identity to SEQ ID NO:6 and is expressed at least 1.5        fold higher relative to expression in a normal (i.e., non        cancerous) cell with at least a 95% confidence level; or has at        least 80% identity to at least a 20 contiguous nucleotide        fragment of SEQ ID NO:6; or has at least 80% identity to at        least a 20 contiguous nucleotide fragment of SEQ ID NO:6 and is        expressed at least 1.5 fold higher relative to expression in a        normal (i.e., non cancerous) cell with at least a 95% confidence        level;    -   N4 comprises any RNA sequence;    -   N5 has at least 75% sequence identity to SEQ ID NO:5; or has at        least 50% identity to SEQ ID NO:5 and is expressed at least 1.5        fold higher relative to expression in a normal (i.e., non        cancerous) cell with at least a 95% confidence level; or has at        least 80% identity to at least a 20 contiguous nucleotide        fragment of SEQ ID NO:5; or has at least 80% identity to at        least a 20 contiguous nucleotide fragment of SEQ ID NO:5 and is        expressed at least 1.5 fold higher relative to expression in a        normal (i.e., non cancerous) cell with at least a 95% confidence        level; and    -   at least one of N₁, N₂, N₃, N₄ or N₅ is present, but polyA is        optional.

Although only at least one of N₁, N₂, N₃, N₄ or N₅ needs to be present,it is preferred that two, three, four or five of these regions arepresent. It is preferred that at least one of N₁ and/or N₅ is present.

N₁ is preferably present in the mRNA to be detected (i.e. the inventionis preferably based on the detection of mRNA driven by a 5′ LTR). Morepreferably, at least N₁—N₂ is present.

Where N₁ is present, it is preferably at the 5′ end of the mRNA (i.e.5′-N₁— . . . ).

Where N₅ is present, it is preferably immediately before a 3′ polyA tail(i.e. . . . —N₅-polyA-3′).

Where N₄ is present, it preferably comprises a polypeptide-codingsequence (e.g. encoding a HML-2 polypeptide). Examples of HML-2polypeptide-coding sequences are described below.

The RNA will generally have a 5′ cap.

B.1—Enriching RNA in a Sample

Where diagnosis is based on mRNA detection, the method of the inventionpreferably comprises an initial step of: (a) extracting RNA (e.g. mRNA)from a patient sample; (b) removing DNA from a patient sample withoutremoving mRNA; and/or (c) removing or disrupting DNA which comprises SEQID NO:4, but not RNA which comprises SEQ ID NO:4, from a patient sample.This is necessary because the genomes of both normal and cancerousprostate cells contain multiple PCAV DNA templates, whereas increasedPCA-mRNA levels are only found in cancerous cells. As an alternative, aRNA-specific assay can be used which is not affected by the presence ofhomologous DNA.

Methods for extracting RNA from biological samples are well known [e.g.refs. 2 & 8] and include methods based on guanidinium buffers, lithiumchloride, SDS/potassium acetate etc. After total cellular RNA has beenextracted, mRNA may be enriched e.g. using oligo-dT techniques.

Methods for removing DNA from biological samples without removing mRNAare well known [e.g. appendix C of ref. 2] and include DNase digestion.

Methods for removing DNA, but not RNA, comprising PCA-mRNA sequenceswill use a reagent which is specific to a sequence within a PCA-mRNAe.g. a restriction enzyme which recognizes a DNA sequence within SEQ IDNO:4, but which does not cleave the corresponding RNA sequence.

Methods for specifically purifying PCA-mRNAs from a sample may also beused. One such method uses an affinity support which binds to PCA-mRNAs.The affinity support may include a polypeptide sequence which binds tothe PCAV-mRNA e.g. the cORF polypeptide, which binds to the LTR ofHERV-K mRNAs in a sequence-specific manner, or HIV Rev protein, whichhas been shown to recognize the HERV-K LTR [3].

B.2—Direct Detection of RNA

Various techniques are available for detecting the presence or absenceof a particular RNA sequence in a sample [e.g. refs. 2 & 8]. If a samplecontains genomic PCAV DNA, the detection technique will generally beRNA-specific; if the sample contains no PCAV DNA, the detectiontechnique may or may not be RNA-specific.

Hybridization-based detection techniques may be used, in which apolynucleotide probe complementary to a region of PCA-mRNA is contactedwith a RNA-containing sample under hybridizing conditions. Detection ofhybridization indicates that nucleic acid complementary to the probe ispresent. Hybridization techniques for use with RNA include Northernblots, in situ hybridization and arrays.

Sequencing may also be used, in which the sequence(s) of RNA moleculesin a sample are obtained. These techniques reveal directly whether asequence of interest is present in a sample. Sequence determination ofthe 5′ end of a RNA corresponding to N₁ will generally be adequate.

Amplification-based techniques may also be used. These include PCR, SDA,SSSR, LCR, TMA, NASBA, T7 amplification etc. The technique preferablygives exponential amplification. A preferred technique for use with RNAis RT-PCR [e.g. see chapter 15 of ref. 2]. RT-PCR of mRNA from prostatecells is reported in references 4, 5, 6 & 7.

B.3—Indirect Detection of RNA

Rather than detect RNA directly, it may be preferred to detect moleculeswhich are derived from RNA (i.e. indirect detection of RNA). A typicalindirect method of detecting mRNA is to prepare cDNA by reversetranscription and then to directly detect the cDNA. Direct detection ofcDNA will generally use the same techniques as described above fordirect detection of RNA (but it will be appreciated that methods such asRT-PCR are not suitable for DNA detection and that cDNA isdouble-stranded, so detection techniques can be based on a sequence, onits complement, or on the double-stranded molecule).

B.4—Polynucleotide Materials

The invention provides polynucleotide materials for use in the detectionof PCAV nucleic acids.

The invention provides an isolated polynucleotide comprising: (a) thenucleotide sequence N₁—N₂—N₃—N₄—N₅-polyA as defined above; (b) afragment of at least x nucleotides of nucleotide sequence N₁—N₂—N₃—N₄—N₅as defined above; (c) a nucleotide sequence having at least s % identityto nucleotide sequence N₁—N₂—N₃—N₄—N₅ as defined above; or (d) thecomplement of (a), (b) or (c). These polynucleotides include variants ofnucleotide sequence N₁—N₂—N₃—N₄—N₅-polyA (e.g. degenerate variants,allelic variants, homologs, orthologs, mutants etc.).

Fragment (b) is preferably a fragment of N₁.

The value of x is at least 7 (e.g. at least 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70,75, 80, 90, 100 etc.). The value ofx may be less than 2000 (e.g. lessthan 1000, 500, 100, or 50).

The value of s is preferably at least 50 (e.g. at least 55, 60, 65, 70,75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 99.5, 99.9 etc.).

The invention also provides an isolated polynucleotide having formula5′-A-B-C-3′, wherein: -A- is a nucleotide sequence consisting of anucleotides; -C- is a nucleotide sequence consisting of c nucleotides;-B- is a nucleotide sequence consisting of either (a) a fragment of bnucleotides of nucleotide sequence N₁—N₂—N₃—N₄—N₅ as defined above or(b) the complement of a fragment of b nucleotides of nucleotide sequenceN₁—N₂—N₃—N₄—N₅ as defined above; and said polynucleotide is neither (a)a fragment of nucleotide sequence N₁—N₂—N₃—N₄—N₅ or (b) the complementof a fragment of nucleotide sequence N₁—N₂—N₃—N₄—N₅.

The -B- moiety is preferably a fragment of N₁—N₂, and more preferably afragment of N₁. The -A- and/or -C- moieties may comprise a promotersequence (or its complement) e.g. for use in TMA.

The value of a+c is at least 1 (e.g. at least 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 etc.). The value of b isat least 7 (e.g. at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 etc.).It is preferred that the value of a+b+c is at least 9 (e.g. at least 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40,45, 50, 60, 70, 80, 90, 100 etc.). It is preferred that the value ofa+b+c is at most 500 (e.g. at most 450, 400, 350, 300, 250, 200, 190,180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30,25, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9).

Where -B- is a fragment of N₁—N₂—N₃—N₄—N₅, the nucleotide sequence of-A- typically shares less than n % sequence identity to the anucleotides which are 5′ of sequence -B- in N₁—N₂—N₃—N₄—N₅ and/or thenucleotide sequence of —C— typically shares less than n % sequenceidentity to the c nucleotides which are 3′ of sequence -C- inN₁—N₂—N₃—N₄—N₅. Similarly, where -B- is the complement of a fragment ofN₁—N₂—N₃—N₄—N₅, the nucleotide sequence of -A- typically shares lessthan n % sequence identity to the complement of the a nucleotides whichare 5′ of the complement of sequence -B- in N₁—N₂—N₃—N₄—N₅ and/or thenucleotide sequence of -C- typically shares less than n % sequenceidentity to the complement of the c nucleotides which are 3′ of thecomplement of sequence -C- in N₁—N₂—N₃—N₄—N₅. The value of n isgenerally 60 or less (e.g. 50, 40, 30, 20, 10 or less).

The invention also provides an isolated polynucleotide which selectivelyhybridizes to a nucleic acid having nucleotide sequence N₁—N₂—N₃—N₄—N₅as defined above or to a nucleic acid having the complement ofnucleotide sequence N₁—N₂—N₃—N₄—N₅ as defined above. The polynucleotidepreferably hybridizes to at least N₁.

Hybridization reactions can be performed under conditions of different“stringency”. Conditions that increase stringency of a hybridizationreaction of widely known and published in the art [e.g. page 7.52 ofreference 8]. Examples of relevant conditions include (in order ofincreasing stringency): incubation temperatures of 25° C., 37° C., 50°C., 55° C. and 68° C.; buffer concentrations of 10×SSC, 6×SSC, 1×SSC,0.1×SSC (where SSC is 0.15 M NaCl and 15 mM citrate buffer) and theirequivalents using other buffer systems; formamide concentrations of 0%,25%, 50%, and 75%; incubation times from 5 minutes to 24 hours; 1, 2, ormore washing steps; wash incubation times of 1, 2, or 15 minutes; andwash solutions of 6×SSC, 1×SSC, 0.1×SSC, or de-ionized water.Hybridization techniques are well known in the art [e.g. see references2, 8, 9, 10, 11 etc.]. Depending upon the particular polynucleotidesequence and the particular domain encoded by that polynucleotidesequence, hybridization conditions upon which to compare apolynucleotide of the invention to a known polynucleotide may differ, aswill be understood by the skilled artisan.

In some embodiments, the isolated polynucleotide of the inventionselectively hybridizes under low stringency conditions; in otherembodiments it selectively hybridizes under intermediate stringencyconditions; in other embodiments, it selectively hybridizes under highstringency conditions. An exemplary set of low stringency hybridizationconditions is 50° C. and 10×SSC. An exemplary set of intermediatestringency hybridization conditions is 55° C. and 1×SSC. An exemplaryset of high stringent hybridization conditions is 68° C. and 0.1×SSC.

The polynucleotides of the invention are particularly useful as probesand/or as primers for use in hybridization and/or amplificationreactions.

More than one polynucleotide of the invention can hybridize to the samenucleic acid target (e.g. more than one can hybridize to a single RNA).

References to a percentage sequence identity between two nucleic acidsequences mean that, when aligned, that percentage of bases are the samein comparing the two sequences. This alignment and the percent homologyor sequence identity can be determined using software programs known inthe art, for example those described in section 7.7.18 of reference 11.A preferred alignment program is GCG Gap (Genetics Computer Group,Wisconsin, Suite Version 10.1), preferably using default parameters,which are as follows: open gap=3; extend gap=1.

Polynucleotides of the invention may take various forms e.g.single-stranded, double-stranded, linear, circular, vectors, primers,probes etc.

Polynucleotides of the invention can be prepared in many ways e.g. bychemical synthesis (at least in part), by digesting longerpolynucleotides using restriction enzymes, from genomic or cDNAlibraries, from the organism itself etc.

Polynucleotides of the invention may be attached to a solid support(e.g. a bead, plate, filter, film, slide, resin, etc.)

Polynucleotides of the invention may include a detectable label (e.g. aradioactive or fluorescent label, or a biotin label). This isparticularly useful where the polynucleotide is to be used in nucleicacid detection techniques e.g. where the nucleic acid is a primer or asa probe for use in techniques such as PCR, LCR, TMA, NASBA, bDNA etc.

The term “polynucleotide” in general means a polymeric form ofnucleotides of any length, which contain deoxyribonucleotides,ribonucleotides, and/or their analogs. It includes DNA, RNA, DNA/RNAhybrids, and DNA or RNA analogs, such as those containing modifiedbackbones or bases, and also peptide nucleic acids (PNA) etc. The term“polynucleotide” is not intended to be limiting as to the length orstructure of a nucleic acid unless specifically indicated, and thefollowing are non-limiting examples of polynucleotides: a gene or genefragment, exons, introns, mRNA, tRNA, rRNA, ribozymes, cDNA, recombinantpolynucleotides, branched polynucleotides, plasmids, vectors, anyisolated DNA from any source, any isolated RNA from any sequence,nucleic acid probes, and primers. Polynucleotides may have anythree-dimensional structure, and may perform any function, known orunknown. Unless otherwise specified or required, any embodiment of theinvention that includes a polynucleotide encompasses both thedouble-stranded form and each of two complementary single-stranded formsknown or predicted to make up the double stranded form.

Polynucleotides of the invention may be isolated and obtained insubstantial purity, generally as other than an intact chromosome.Usually, the polynucleotides will be obtained substantially free ofother naturally-occurring nucleic acid sequences, generally being atleast about 50% (by weight) pure, usually at least about 90% pure.

Polynucleotides of the invention (particularly DNA) are typically“recombinant” e.g. flanked by one or more nucleotides with which it isnot normally associated on a naturally-occurring chromosome.

The polynucleotides can be used, for example: to produce polypeptides;as probes for the detection of nucleic acid in biological samples; togenerate additional copies of the polynucleotides; to generate ribozymesor antisense oligonucleotides; and as single-stranded DNA probes or astriple-strand forming oligonucleotides. The polynucleotides arepreferably uses to detect PCA-mRNAs.

A “vector” is a polynucleotide construct designed fortransduction/transfection of one or more cell types. Vectors may be, forexample, “cloning vectors” which are designed for isolation, propagationand replication of inserted nucleotides, “expression vectors” which aredesigned for expression of a nucleotide sequence in a host cell, “viralvectors” which is designed to result in the production of a recombinantvirus or virus-like particle, or “shuttle vectors”, which comprise theattributes of more than one type of vector.

A “host cell” includes an individual cell or cell culture which can beor has been a recipient of exogenous polynucleotides. Host cells includeprogeny of a single host cell, and the progeny may not necessarily becompletely identical (in morphology or in total DNA complement) to theoriginal parent cell due to natural, accidental, or deliberate mutationand/or change. A host cell includes cells transfected or infected invivo or in vitro with a polynucleotide of this invention.

B.5—Nucleic Acid Detection Kits

The invention provides a kit comprising primers (e.g. PCR primers) foramplifying a template sequence contained within a PCAV nucleic acid, thekit comprising a first primer and a second primer, wherein the firstprimer is substantially complementary to said template sequence and thesecond primer is substantially complementary to a complement of saidtemplate sequence, wherein the parts of said primers which havesubstantial complementarity define the termini of the template sequenceto be amplified. The first primer and/or the second primer may include adetectable label.

The invention also provides a kit comprising first and secondsingle-stranded oligonucleotides which allow amplification of a PCAVtemplate nucleic acid sequence contained in a single- or double-strandednucleic acid (or mixture thereof), wherein: (a) the firstoligonucleotide comprises a primer sequence which is substantiallycomplementary to said template nucleic acid sequence; (b) the secondoligonucleotide comprises a primer sequence which is substantiallycomplementary to the complement of said template nucleic acid sequence;(c) the first oligonucleotide and/or the second oligonucleotidecomprise(s) sequence which is not complementary to said template nucleicacid; and (d) said primer sequences define the termini of the templatesequence to be amplified. The non-complementary sequence(s) of feature(c) are preferably upstream of (i.e. 5′ to) the primer sequences. One orboth of the (c) sequences may comprise a restriction site [12] orpromoter sequence [13]. The first and/or the second oligonucleotide mayinclude a detectable label.

The kit of the invention may also comprise a labeled polynucleotidewhich comprises a fragment of the template sequence (or its complement).This can be used in a hybridization technique to detect amplifiedtemplate.

The primers and probes used in these kits are preferably polynucleotidesas described in section B.4.

The template is preferably a sequence as defined in section B.1 above.

C—Polypeptide Expression Product

Where the method is based on polypeptide detection, it will involvedetecting expression of a polypeptide encoded by a PCAV-mRNA. This willtypically involve detecting one or more of the following HML-2polypeptides: gag, prt, pol, env, cORF. Although some PCA-mRNAs encodeall of these polypeptides (e.g. ERVK6 [1]), the polypeptide-codingregions of most HERVs (including PCAVs) contain mutations which meanthat one or more coding-regions in the mRNA transcript are eithermutated or absent. Thus not all PCAVs have the ability to encode allHML-2 polypeptides.

The transcripts which encode HML-2 polypeptides are generated byalternative splicing of the full-length mRNA copy of the endogenousgenome [e.g. FIG. 4 of ref. 143].

HML-2 gag polypeptide is encoded by the first long ORF in a completeHML-2 genome [140]. Full-length gag polypeptide is proteolyticallycleaved.

Examples of gag nucleotide sequences are: SEQ ID NOS:7, 8, 9 & 11[HERV-K(CH)]; SEQ ID NO:85 [HERV-K108]; SEQ ID NO:91 [HERV-K(C7)]; SEQID NO:97 [HERV-K(II)]; SEQ ID NO:102 [HERV-K10].

Examples of gag polypeptide sequences are: SEQ ID NOS:46, 47, 48, 49, 56& 57 [HERV-K(CH)]; SEQ ID NO:92 [HERV-K(C7)]; SEQ ID NO:98 [HERV-K(II)];SEQ ID NOS:103 & 104 [HERV-K10]; SEQ ID NO:146 [‘ERVK6’].

An alignment of gag polypeptide sequences is shown in FIG. 7.

HML-2 prt polypeptide is encoded by the second long ORF in a completeHML-2 genome. It is translated as a gag-prt fusion polypeptide. Thefusion polypeptide is proteolytically cleaved to

Examples of prt nucleotide sequences are: SEQ ID NO:86 [HERV-K(108)];SEQ ID NO:99 [HERV-K(II)]; SEQ ID NO:105 [HERV-K10].

Examples of prt polypeptide sequences are: SEQ ID NO:106 [HERV-K10]; SEQID NO:147 [‘ERVK6’].

HML-2 pol polypeptide is encoded by the third long ORF in a completeHML-2 genome. It is translated as a gag-prt-pol fusion polypeptide. Thefusion polypeptide is proteolytically cleaved to give three polproducts—reverse transcriptase, endonuclease and integrase [14].

Examples of pol nucleotide sequences are: SEQ ID NO:87 [HERV-K(108)];SEQ ID NO:93 [HERV-K(C7)]; SEQ ID NO:100 [HERV-K(II)]; SEQ ID NO:107[HERV-K10].

Examples of pol polypeptide sequences are: SEQ ID NO:94 [HERV-K(C7)];SEQ ID NO:108 [HERV-K10]; SEQ ID NO:148 [‘ERVK6’].

An alignment of pol polypeptide sequences is shown in FIG. 8.

HML-2 env polypeptide is encoded by the fourth long ORF in a completeHML-2 genome. The translated polypeptide is proteolytically cleaved.

Examples of env nucleotide sequences are: SEQ ID NO:88 [HERV-K(108)];SEQ ID NO:95 [HERV-K(C7)]; SEQ ID NO:101 [HERV-K(II)]; SEQ ID NO:107[HERV-K10].

Examples of env polypeptide sequences are: SEQ ID NO:96 [HERV-K(C7)];SEQ ID NO:108 [HERV-K10]; SEQ ID NO:149 [‘ERVK6’].

Alignments of env polynucleotide and polypeptide sequences are shown inFIGS. 6 and 9.

HML-2 cORF polypeptide is encoded by an ORF which shares the same 5′region and start codon as env. After amino acid 87, a splicing eventremoves env-coding sequences and the cORF-coding sequence continues inthe reading frame +1 relative to that of env [15, 16; see below]. cORFhas also been called Rec [17].

Examples of cORF nucleotide sequences are: SEQ ID NO:89 and SEQ ID NO:90[HERV-K(108)].

Examples of cORF polypeptide sequences are SEQ ID NO:109.

C.1—Direct Detection of HML-2 Polypeptides

Various techniques are available for detecting the presence or absenceof a particular polypeptides in a sample. These are generallyimmunoassay techniques which are based on the specific interactionbetween an antibody and an antigenic amino acid sequence in thepolypeptide. Suitable techniques include standard immunohistologicalmethods, immunoprecipitation, immunofluorescence, ELISA, RIA, FIA, etc.

In general, therefore, the invention provides a method for detecting thepresence of and/or measuring a level of a polypeptide of the inventionin a biological sample, wherein the method uses an antibody specific forthe polypeptide. The method generally comprises the steps of: a)contacting the sample with an antibody specific for the polypeptide; andb) detecting binding between the antibody and polypeptides in thesample.

Polypeptides of the invention can also be detected by functional assayse.g. assays to detect binding activity or enzymatic activity. Forinstance, a functional assay for cORF is disclosed in references 16, 129& 130. A functional assay for the protease is disclosed in reference140.

Another way for detecting polypeptides of the invention is to usestandard proteomics techniques e.g. purify or separate polypeptides andthen use peptide sequencing. For example, polypeptides can be separatedusing 2D-PAGE and polypeptide spots can be sequenced (e.g. by massspectroscopy) in order to identify if a sequence is present in a targetpolypeptide.

Detection methods may be adapted for use in vivo (e.g. to locate oridentify sites where cancer cells are present). In these embodiments, anantibody specific for a target polypeptide is administered to anindividual (e.g. by injection) and the antibody is located usingstandard imaging techniques (e.g. magnetic resonance imaging, computedtomography scanning, etc.). Appropriate labels (e.g. spin labels etc.)will be used. Using these techniques, cancer cells are differentiallylabeled.

An immunofluorescence assay can be easily performed on cells without theneed for purification of the target polypeptide. The cells are firstfixed onto a solid support, such as a microscope slide or microtiterwell. The membranes of the cells are then permeablized in order topermit entry of polypeptide-specific antibody (NB: fixing andpermeabilization can be achieved together). Next, the fixed cells areexposed to an antibody which is specific for the encoded polypeptideandwhich is fluorescently labeled. The presence of this label (e.g.visualized under a microscope) identifies cells which express the targetPCAV polypeptide. To increase the sensitivity of the assay, it ispossible to use a second antibody to bind to the anti-PCAV antibody,with the label being carried by the second antibody. [18]

C.2—Indirect Detection of HML-2 Polypeptides

Rather than detect polypeptides directly, it may be preferred to detectmolecules which are produced by the body in response to a polypeptide(i.e. indirect detection of a polypeptide). This will typically involvethe detection of antibodies, so the patient sample will generally be ablood sample. Antibodies can be detected by conventional immunoassaytechniques e.g. using PCAV polypeptides of the invention, which willtypically be immobilized.

Antibodies against HERV-K polypeptides have been detected in humans[143].

C.3—Polypeptide Materials

The invention provides polypeptides for use in the detection methods ofthe invention. In general, these polypeptides will be encoded byPCA-mRNAs e.g. by sequence(s) in the —N₄— region.

The invention provides an isolated polypeptide comprising: (a) an aminoacid sequence selected from the group consisting of SEQ ID NOS:109(cORF), 146 (gag), 147 (prt), 148 (pol), 149 (env); (b) a fragment of atleast x amino acids of (a); or (c) a polypeptide sequence having atleast s % identity to (a). These polypeptides include variants (e.g.allelic variants, homologs, orthologs, mutants etc.).

The value of x is at least 5 (e.g. at least 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60,70, 75, 80, 90, 100 etc.). The value of x may be less than 2000 (e.g.less than 1000, 500, 100, or 50).

The value of s is preferably at least 50 (e.g. at least 55, 60, 65, 70,75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 99.5, 99.9 etc.).

The invention also provides an isolated polypeptide having formulaNH2-A-B-C-COOH, wherein: A is a polypeptide sequence consisting of aamino acids; C is a polypeptide sequence consisting of c amino acids; Bis a polypeptide sequence consisting of a fragment of b amino acids ofan amino acid sequence selected from the group consisting of SEQ IDNOS:109, 146, 147, 148, 149; and said polypeptide is not a fragment ofpolypeptide sequence SEQ ID NO:109, 146, 147, 148 or 149.

The value of a+c is at least 1 (e.g. at least 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 etc.). The value of b isat least 5 (e.g. at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100etc.). It is preferred that the value of a+b+c is at least 9 (e.g. atleast 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,30, 35, 40, 45, 50, 60, 70, 80, 90, 100 etc.). It is preferred that thevalue of a+b+c is at most 500 (e.g. at most 450, 400, 350, 300, 250,200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60,50, 40, 30, 25, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9).

The amino acid sequence of -A- typically shares less than n % sequenceidentity to the a amino acids which are N-terminal of sequence -B- inSEQ ID NO:109, 146, 147, 148 or 149 and the amino acid sequence of -C-typically shares less than n % sequence identity to the c amino acidswhich are C-terminal of sequence -B- in SEQ ID NO:109, 146, 147, 148 or149. The value of n is generally 60 or less (e.g. 50, 40, 30, 20, 10 orless).

The fragment of (b) may comprise a T-cell or, preferably, a B-cellepitope of SEQ ID NO:109, 146, 147, 148 or 149. T- and B-cell epitopescan be identified empirically (e.g. using the PEPSCAN method [19, 20] orsimilar methods), or they can be predicted (e.g. using the Jameson-Wolfantigenic index [21], matrix-based approaches [22], TEPITOPE [23],neural networks [24], OptiMer & EpiMer [25, 26], ADEPT [27], Tsites[28], hydrophilicity [29], antigenic index [30] or the methods disclosedin reference 31 etc.).

References to a percentage sequence identity between two amino acidsequences means that, when aligned, that percentage of amino acids arethe same in comparing the two sequences. This alignment and the percenthomology or sequence identity can be determined using software programsknown in the art, for example those described in section 7.7.18 ofreference 11. A preferred alignment is determined by the Smith-Watermanhomology search algorithm using an affine gap search with a gap openpenalty of 12 and a gap extension penalty of 2, BLOSUM matrix of 62. TheSmith-Waterman homology search algorithm is taught in reference 32.

Polypeptides of the invention can be prepared in many ways e.g. bychemical synthesis (at least in part), by digesting longer polypeptidesusing proteases, by translation from RNA, by purification from cellculture (e.g. from recombinant expression), from the organism itself(e.g. isolation from prostate tissue), from a cell line source etc.

Polypeptides of the invention can be prepared in various forms (e.g.native, fusions, glycosylated, non-glycosylated etc.).

Polypeptides of the invention may be attached to a solid support.

Polypeptides of the invention may comprise a detectable label (e.g. aradioactive or fluorescent label, or a biotin label).

In general, the polypeptides of the subject invention are provided in anon-naturally occurring environment e.g. they are separated from theirnaturally-occurring environment. In certain embodiments, the subjectpolypeptide is present in a composition that is enriched for thepolypeptide as compared to a control. As such, purified polypeptide isprovided, whereby purified is meant that the polypeptide is present in acomposition that is substantially free of other expressed polypeptides,where by substantially free is meant that less than 90%, usually lessthan 60% and more usually less than 50% of the composition is made up ofother expressed polypeptides.

The term “polypeptide” refers to amino acid polymers of any length. Thepolymer may be linear or branched, it may comprise modified amino acids,and it may be interrupted by non-amino acids. The terms also encompassan amino acid polymer that has been modified naturally or byintervention; for example, disulfide bond formation, glycosylation,lipidation, acetylation, phosphorylation, or any other manipulation ormodification, such as conjugation with a labeling component. Alsoincluded within the definition are, for example, polypeptides containingone or more analogs of an amino acid (including, for example, unnaturalamino acids, etc.), as well as other modifications known in the art.Polypeptides can occur as single chains or associated chains.Polypeptides of the invention can be naturally or non-naturallyglycosylated (i.e. the polypeptide has a glycosylation pattern thatdiffers from the glycosylation pattern found in the correspondingnaturally occurring polypeptide).

Mutants can include amino acid substitutions, additions or deletions.The amino acid substitutions can be conservative amino acidsubstitutions or substitutions to eliminate non-essential amino acids,such as to alter a glycosylation site, a phosphorylation site or anacetylation site, or to minimize misfolding by substitution or deletionof one or more cysteine residues that are not necessary for function.Conservative amino acid substitutions are those that preserve thegeneral charge, hydrophobicity/hydrophilicity, and/or steric bulk of theamino acid substituted. Variants can be designed so as to retain or haveenhanced biological activity of a particular region of the polypeptide(e.g. a functional domain and/or, where the polypeptide is a member of apolypeptide family, a region associated with a consensus sequence).Selection of amino acid alterations for production of variants can bebased upon the accessibility (interior vs. exterior) of the amino acid(e.g. ref. 33), the thermostability of the variant polypeptide (e.g.ref. 34), desired glycosylation sites (e.g. ref. 35), desired disulfidebridges (e.g. refs. 36 & 37), desired metal binding sites (e.g. refs. 38& 39), and desired substitutions with in proline loops (e.g. ref. 40).Cysteine-depleted muteins can be produced as disclosed in reference 41.

C.4—Antibody Materials

The invention also provides isolated antibodies, or antigen-bindingfragments thereof, that bind to a polypeptide of the invention. Theinvention also provides isolated antibodies or antigen binding fragmentsthereof, that bind to a polypeptide encoded by a polynucleotide of theinvention.

Antibodies of the invention may be polyclonal or monoclonal and may beproduced by any suitable means (e.g. by recombinant expression).

Antibodies of the invention may include a label. The label may bedetectable directly, such as a radioactive or fluorescent label.Alternatively, the label may be detectable indirectly, such as an enzymewhose products are detectable (e.g. luciferase, β-galactosidase,peroxidase etc.).

Antibodies of the invention may be attached to a solid support.

Antibodies of the invention may be prepared by administering (e.g.injecting) a polypeptide of the invention to an appropriate animal (e.g.a rabbit, hamster, mouse or other rodent).

Antigen-binding fragments of antibodies include Fv, scFv, Fc, Fab,F(ab′)₂ etc.

To increase compatibility with the human immune system, the antibodiesmay be chimeric or humanized [e.g. refs. 42 & 43], or fully humanantibodies may be used. Because humanized antibodies are far lessimmunogenic in humans than the original non-human monoclonal antibodies,they can be used for the treatment of humans with far less risk ofanaphylaxis. Thus, these antibodies may be preferred in therapeuticapplications that involve in vivo administration to a human such as, useas radiation sensitizers for the treatment of neoplastic disease or usein methods to reduce the side effects of cancer therapy.

Humanized antibodies may be achieved by a variety of methods including,for example: (1) grafting non-human complementarity determining regions(CDRs) onto a human framework and constant region (“humanizing”), withthe optional transfer of one or more framework residues from thenon-human antibody; (2) transplanting entire non-human variable domains,but “cloaking” them with a human-like surface by replacement of surfaceresidues (“veneering”). In the present invention, humanized antibodieswill include both “humanized” and “veneered” antibodies. [44, 45, 46,47, 48, 49, 50].

CDRs are amino acid sequences which together define the binding affinityand specificity of a Fv region of a native immunoglobulin binding site[e.g. refs. 51 & 52].

The phrase “constant region” refers to the portion of the antibodymolecule that confers effector functions. In chimeric antibodies, mouseconstant regions are substituted by human constant regions. The constantregions of humanized antibodies are derived from human immunoglobulins.The heavy chain-constant region can be selected from any of the 5isotypes: alpha, delta, epsilon, gamma or mu.

One method of humanizing antibodies comprises aligning the heavy andlight chain sequences of a non-human antibody to human heavy and lightchain sequences, replacing the non-human framework residues with humanframework residues based on such alignment, molecular modeling of theconformation of the humanized sequence in comparison to the conformationof the non-human parent antibody, and repeated back mutation of residuesin the framework region which disturb the structure of the non-humanCDRs until the predicted conformation of the CDRs in the humanizedsequence model closely approximates the conformation of the non-humanCDRs of the parent non-human antibody. Such humanized antibodies may befurther derivatized to facilitate uptake and clearance e.g, via Ashwellreceptors. [refs. 53 & 54]

Humanized or fully-human antibodies can also be produced usingtransgenic animals that are engineered to contain human immunoglobulinloci. For example, ref. 55 discloses transgenic animals having a humanIg locus wherein the animals do not produce functional endogenousimmunoglobulins due to the inactivation of endogenous heavy and lightchain loci. Ref. 56 also discloses transgenic non-primate mammalianhosts capable of mounting an immune response to an immunogen, whereinthe antibodies have primate constant and/or variable regions, andwherein the endogenous immunoglobulin-encoding loci are substituted orinactivated. Ref. 57 discloses the use of the Cre/Lox system to modifythe immunoglobulin locus in a mammal, such as to replace all or aportion of the constant or variable region to form a modified antibodymolecule. Ref. 58 discloses non-human mammalian hosts having inactivatedendogenous Ig loci and functional human Ig loci. Ref. 59 disclosesmethods of making transgenic mice in which the mice lack endogenousheavy claims, and express an exogenous immunoglobulin locus comprisingone or more xenogeneic constant regions.

Using a transgenic animal described above, an immune response can beproduced to a PCAV polypeptide, and antibody-producing cells can beremoved from the animal and used to produce hybridomas that secretehuman monoclonal antibodies. Immunization protocols, adjuvants, and thelike are known in the art, and are used in immunization of, for example,a transgenic mouse as described in ref. 60. The monoclonal antibodiescan be tested for the ability to inhibit or neutralize the biologicalactivity or physiological effect of the corresponding polypeptide.

D—Comparison with Control Samples

D.1—The Control

HML-2 transcripts are up-regulated in tumors, including prostate tumors.To detect such up-regulation, a reference point is needed i.e. acontrol. Analysis of the control sample gives a standard level of RNAand/or protein expression against which a patient sample can becompared.

A negative control gives a background or basal level of expressionagainst which a patient sample can be compared. Higher levels ofexpression product relative to a negative control indicate that thepatient from whom the sample was taken has, for example, prostatecancer. Typically, for prostate cancer, for example, negative controlswould include lifetime baseline levels of expression or the expressionlevel observed in pooled normals. Conversely, equivalent levels ofexpression product indicate that the patient does not have aHML-2-related cancer such as prostate cancer.

A positive control gives a level of expression against which a patientsample can be compared. Equivalent or higher levels of expressionproduct relative to a positive control indicate that the patient fromwhom the sample was taken has cancer such as prostate cancer.Conversely, lower levels of expression product indicate that the patientdoes not have a HML-2 related cancer such as prostate cancer.

For direct or indirect RNA measurement, or for direct polypeptidemeasurement, a negative control will generally comprise cells which arenot from a tumor cell, e.g. a prostate tumor cell. For indirectpolypeptide measurement, a negative control will generally be a bloodsample from a patient who does not have a prostate tumor. The negativecontrol could be a sample from the same patient as the patient sample,but from a tissue in which HML-2 expression is not up-regulated e.g. anon-tumor non-prostate cell. The negative control could be a prostatecell from the same patient as the patient sample, but taken at anearlier stage in the patient's life. The negative control could be acell from a patient without a prostate tumor. This cell may or may notbe a prostate cell. The negative control cell could be a prostate cellfrom a patient with BPH.

For direct or indirect RNA measurement, or for direct polypeptidemeasurement, a positive control will generally comprise cells from atumor cell e.g. a prostate tumor. For indirect polypeptide measurement,a positive control will generally be a blood sample from a patient whohas a prostate tumor. The positive control could be a prostate tumorcell from the same patient as the patient sample, but taken at anearlier stage in the patient's life (e.g. to monitor remission). Thepositive control could be a cell from another patient with a prostatetumor. The positive control could be a prostate cell line.

Other suitable positive and negative controls will be apparent to theskilled person.

HML-2 expression in the control can be assessed at the same time asexpression in the patient sample. Alternatively, HML-2 expression in thecontrol can be assessed separately (earlier or later).

Rather than actually compare two samples, however, the control may be anabsolute value i.e. a level of expression which has been empiricallydetermined from samples taken from prostate tumor patients (e.g. understandard conditions).

D. 2—Degree of Up-Regulation

The up-regulation relative to the control (100%) will usually be atleast 150% (e.g. 200%, 250%, 300%, 400%, 500%, 600% or more).

D.3—Diagnosis

The invention provides a method for diagnosing prostate cancer. It willbe appreciated that “diagnosis” according to the invention can rangefrom a definite clinical diagnosis of disease to an indication that thepatient should undergo further testing which may lead to a definitediagnosis. For example, the method of the invention can be used as partof a screening process, with positive samples being subjected to furtheranalysis.

Furthermore, diagnosis includes monitoring the progress of cancer in apatient already known to have the cancer. Cancer can also be staged bythe methods of the invention. Preferably, the cancer is prostate cancer.

The efficacy of a treatment regimen (therametrics) of a cancerassociated can also monitored by the method of the invention e.g. todetermine its efficacy.

Susceptibility to a cancer can also be detected e.g. where up-regulationof expression has occurred, but before cancer has developed. Prognosticmethods are also encompassed.

All of these techniques fall within the general meaning of “diagnosis”in the present invention.

E—Pharmaceutical Compositions

The invention provides a pharmaceutical composition comprisingpolynucleotide, polypeptide, or antibody as defined above. The inventionalso provides their use as medicaments, and their use in the manufactureof medicaments for treating prostate cancer. The invention also providesa method for raising an immune response, comprising administering animmunogenic dose of polynucleotide or polypeptide of the invention to ananimal.

Pharmaceutical compositions encompassed by the present invention includeas active agent, the polynucleotides, polypeptides, or antibodies of theinvention disclosed herein in a therapeutically effective amount. An“effective amount” is an amount sufficient to effect beneficial ordesired results, including clinical results. An effective amount can beadministered in one or more administrations. For purposes of thisinvention, an effective amount is an amount that is sufficient topalliate, ameliorate, stabilize, reverse, slow or delay the symptomsand/or progression of prostate cancer.

The compositions can be used to treat cancer as well as metastases ofprimary cancer. In addition, the pharmaceutical compositions can be usedin conjunction with conventional methods of cancer treatment, e.g. tosensitize tumors to radiation or conventional chemotherapy. The terms“treatment”, “treating”, “treat” and the like are used herein togenerally refer to obtaining a desired pharmacologic and/or physiologiceffect. The effect may be prophylactic in terms of completely orpartially preventing a disease or symptom thereof and/or may betherapeutic in terms of a partial or complete stabilization or cure fora disease and/or adverse effect attributable to the disease. “Treatment”as used herein covers any treatment of a disease in a mammal,particularly a human, and includes: (a) preventing the disease orsymptom from occurring in a subject which may be predisposed to thedisease or symptom but has not yet been diagnosed as having it; (b)inhibiting the disease symptom, i.e. arresting its development; or (c)relieving the disease symptom, i.e. causing regression of the disease orsymptom.

Where the pharmaceutical composition comprises an antibody thatspecifically binds to a gene product encoded by a differentiallyexpressed polynucleotide, the antibody can be coupled to a drug fordelivery to a treatment site or coupled to a detectable label tofacilitate imaging of a site comprising cancer cells, such as prostatecancer cells. Methods for coupling antibodies to drugs and detectablelabels are well known in the art, as are methods for imaging usingdetectable labels.

The term “therapeutically effective amount” as used herein refers to anamount of a therapeutic agent to treat, ameliorate, or prevent a desireddisease or condition, or to exhibit a detectable therapeutic orpreventative effect. The effect can be detected by, for example,chemical markers or antigen levels. Therapeutic effects also includereduction in physical symptoms. The precise effective amount for asubject will depend upon the subject's size and health, the nature andextent of the condition, and the therapeutics or combination oftherapeutics selected for administration. The effective amount for agiven situation is determined by routine experimentation and is withinthe judgment of the clinician. For purposes of the present invention, aneffective dose will generally be from about 0.01 mg/kg to about 5 mg/kg,or about 0.01 mg/kg to about 50 mg/kg or about 0.05 mg/kg to about 10mg/kg of the compositions of the present invention in the individual towhich it is administered.

A pharmaceutical composition can also contain a pharmaceuticallyacceptable carrier. The term “pharmaceutically acceptable carrier”refers to a carrier for administration of a therapeutic agent, such asantibodies or a polypeptide, genes, and other therapeutic agents. Theterm refers to any pharmaceutical carrier that does not itself inducethe production of antibodies harmful to the individual receiving thecomposition, and which can be administered without undue toxicity.Suitable carriers can be large, slowly metabolized macromolecules suchas proteins, polysaccharides, polylactic acids, polyglycolic acids,polymeric amino acids, amino acid copolymers, and inactive virusparticles. Such carriers are well known to those of ordinary skill inthe art. Pharmaceutically acceptable carriers in therapeuticcompositions can include liquids such as water, saline, glycerol andethanol. Auxiliary substances, such as wetting or emulsifying agents, pHbuffering substances, and the like, can also be present in suchvehicles. Typically, the therapeutic compositions are prepared asinjectables, either as liquid solutions or suspensions; solid formssuitable for solution in, or suspension in, liquid vehicles prior toinjection can also be prepared. Liposomes are included within thedefinition of a pharmaceutically acceptable carrier. Pharmaceuticallyacceptable salts can also be present in the pharmaceutical composition,e.g. mineral acid salts such as hydrochlorides, hydrobromides,phosphates, sulfates, and the like; and the salts of organic acids suchas acetates, propionates, malonates, benzoates, and the like. A thoroughdiscussion of pharmaceutically acceptable excipients is available inRemington: The Science and Practice of Pharmacy (1995) Alfonso Gennaro,Lippincott, Williams, & Wilkins.

The composition is preferably sterile and/or pyrogen-free. It willtypically be buffered around pH 7.

Once formulated, the compositions contemplated by the invention can be(1) administered directly to the subject (e.g. as polynucleotide,polypeptides, small molecule agonists or antagonists, and the like); or(2) delivered ex vivo, to cells derived from the subject (e.g. as in exvivo gene therapy). Direct delivery of the compositions will generallybe accomplished by parenteral injection, e.g. subcutaneously,intraperitoneally, intravenously or intramuscularly, intratumoral or tothe interstitial space of a tissue. Other modes of administrationinclude oral and pulmonary administration, suppositories, andtransdermal applications, needles, and gene guns or hyposprays. Dosagetreatment can be a single dose schedule or a multiple dose schedule.

Methods for the ex vivo delivery and reimplantation of transformed cellsinto a subject are known in the art [e.g. ref. 61]. Examples of cellsuseful in ex vivo applications include, for example, stem cells,particularly hematopoetic, lymph cells, macrophages, dendritic cells, ortumor cells.

Generally, delivery of nucleic acids for both ex vivo and in vitroapplications can be accomplished by, for example, dextran-mediatedtransfection, calcium phosphate precipitation, polybrene mediatedtransfection, protoplast fusion, electroporation, encapsulation of thepolynucleotide(s) in liposomes, and direct microinjection of the DNAinto nuclei, all well known in the art.

Differential expression PCAV polynucleotides has been found to correlatewith prostate tumors. The tumor can be amenable to treatment byadministration of a therapeutic agent based on the providedpolynucleotide, corresponding polypeptide or other correspondingmolecule (e.g. antisense, ribozyme, etc.). In other embodiments, thedisorder can be amenable to treatment by administration of a smallmolecule drug that, for example, serves as an inhibitor (antagonist) ofthe function of the encoded gene product of a gene having increasedexpression in cancerous cells relative to normal cells or as an agonistfor gene products that are decreased in expression in cancerous cells(e.g. to promote the activity of gene products that act as tumorsuppressors).

The dose and the means of administration of the inventive pharmaceuticalcompositions are determined based on the specific qualities of thetherapeutic composition, the condition, age, and weight of the patient,the progression of the disease, and other relevant factors. For example,administration of polynucleotide therapeutic compositions agentsincludes local or systemic administration, including injection, oraladministration, particle gun or catheterized administration, and topicaladministration. Preferably, the therapeutic polynucleotide compositioncontains an expression construct comprising a promoter operably linkedto a polynucleotide of the invention. Various methods can be used toadminister the therapeutic composition directly to a specific site inthe body. For example, a small metastatic lesion is located and thetherapeutic composition injected several times in several differentlocations within the body of tumor. Alternatively, arteries which servea tumor are identified, and the therapeutic composition injected intosuch an artery, in order to deliver the composition directly into thetumor. A tumor that has a necrotic center is aspirated and thecomposition injected directly into the now empty center of the tumor. Anantisense composition is directly administered to the surface of thetumor, for example, by topical application of the composition. X-rayimaging is used to assist in certain of the above delivery methods.

Targeted delivery of therapeutic compositions containing an antisensepolynucleotide, subgenomic polynucleotides, or antibodies to specifictissues can also be used. Receptor-mediated DNA delivery techniques aredescribed in, for example, references 62 to 67. Therapeutic compositionscontaining a polynucleotide are administered in a range of about 100 ngto about 200 mg of DNA for local administration in a gene therapyprotocol. Concentration ranges of about 500 ng to about 50 mg, about 1μg to about 2 mg, about 5 μg to about 500 and about 20 μg to about 100μg of DNA can also be used during a gene therapy protocol. Factors suchas method of action (e.g. for enhancing or inhibiting levels of theencoded gene product) and efficacy of transformation and expression areconsiderations which will affect the dosage required for ultimateefficacy of the antisense subgenomic polynucleotides. Where greaterexpression is desired over a larger area of tissue, larger amounts ofantisense subgenomic polynucleotides or the same amounts re-administeredin a successive protocol of administrations, or several administrationsto different adjacent or close tissue portions of, for example, a tumorsite, may be required to effect a positive therapeutic outcome. In allcases, routine experimentation in clinical trials will determinespecific ranges for optimal therapeutic effect.

The therapeutic polynucleotides and polypeptides of the presentinvention can be delivered using gene delivery vehicles. The genedelivery vehicle can be of viral or non-viral origin (see generallyreferences 68, 69, 70 and 71). Expression of such coding sequences canbe induced using endogenous mammalian or heterologous promoters.Expression of the coding sequence can be either constitutive orregulated.

Viral-based vectors for delivery of a desired polynucleotide andexpression in a desired cell are well known in the art. Exemplaryviral-based vehicles include, but are not limited to, recombinantretroviruses (e.g. references 72 to 82), alphavirus-based vectors (e.g.Sindbis virus vectors, Semliki forest virus (ATCC VR-67; ATCC VR-1247),Ross River virus (ATCC VR-373; ATCC VR-1246) and Venezuelan equineencephalitis virus (ATCC VR-923; ATCC VR-1250; ATCC VR 1249; ATCCVR-532)), adenovirus vectors, and adeno-associated virus (AAV) vectors(e.g. see refs. 83 to 88). Administration of DNA linked to killedadenovirus [89] can also be employed.

Non-viral delivery vehicles and methods can also be employed, including,but not limited to, polycationic condensed DNA linked or unlinked tokilled adenovirus alone [e.g. 89], ligand-linked DNA [90], eukaryoticcell delivery vehicles cells [e.g. refs. 91 to 95] and nucleic chargeneutralization or fusion with cell membranes. Naked DNA can also beemployed. Exemplary naked DNA introduction methods are described inrefs. 96 and 97. Liposomes that can act as gene delivery vehicles aredescribed in refs. 98 to 102. Additional approaches are described inrefs. 103 & 104.

Further non-viral delivery suitable for use includes mechanical deliverysystems such as the approach described in ref. 104. Moreover, the codingsequence and the product of expression of such can be delivered throughdeposition of photopolymerized hydrogel materials or use of ionizingradiation [e.g. refs. 105 & 106]. Other conventional methods for genedelivery that can be used for delivery of the coding sequence include,for example, use of hand-held gene transfer particle gun [107] or use ofionizing radiation for activating transferred gene [108 & 109].

Vaccine Compositions

The invention provides a composition comprising a polypeptide orpolynucleotide of the invention and a pharmaceutically acceptablecarrier.

The composition may additionally comprise an adjuvant. For example, thecomposition may comprise one or more of the following adjuvants: (1)oil-in-water emulsion formulations (with or without other specificimmunostimulating agents such as muramyl peptides (see below) orbacterial cell wall components), such as for example (a) MF59™ [110;Chapter 10 in ref. 111], containing 5% Squalene, 0.5% Tween 80, and 0.5%Span 85 (optionally containing MTP-PE) formulated into submicronparticles using a microfluidizer, (b) SAF, containing 10% Squalane, 0.4%Tween 80, 5% pluronic-blocked polymer L121, and thr-MDP eithermicrofluidized into a submicron emulsion or vortexed to generate alarger particle size emulsion, and (c) Ribi™ adjuvant system (RAS),(Ribi Immunochem, Hamilton, Mont.) containing 2% Squalene, 0.2% Tween80, and one or more bacterial cell wall components from the groupconsisting of monophosphorylipid A (MPL), trehalose dimycolate (TDM),and cell wall skeleton (CWS), preferably MPL+CWS (Detox™); (2) saponinadjuvants, such as QS21 or Stimulon™ (Cambridge Bioscience, Worcester,Mass.) may be used or particles generated therefrom such as ISCOMs(immunostimulating complexes), which ISCOMS may be devoid of additionaldetergent [112]; (3) Complete Freund's Adjuvant (CFA) and IncompleteFreund's Adjuvant (IFA); (4) cytokines, such as interleukins (e.g. IL-1,IL-2, IL-4, IL-5, IL-6, IL-7, IL-12 etc.), interferons (e.g. gammainterferon), macrophage colony stimulating factor (M-CSF), tumornecrosis factor (TNF), etc.; (5) monophosphoryl lipid A (MPL) or3-O-deacylated MPL (3dMPL) [e.g. 113, 114]; (6) combinations of 3dMPLwith, for example, QS21 and/or oil-in-water emulsions [e.g. 115, 116,117]; (7) oligonucleotides comprising CpG motifs i.e. containing atleast one CG dinucleotide, with 5-methylcytosine optionally being usedin place of cytosine; (8) a polyoxyethylene ether or a polyoxyethyleneester [118]; (9) a polyoxyethylene sorbitan ester surfactant incombination with an octoxynol [119] or a polyoxyethylene alkyl ether orester surfactant in combination with at least one additional non-ionicsurfactant such as an octoxynol [120]; (10) an immunostimulatoryoligonucleotide (e.g. a CpG oligonucleotide) and a saponin [121]; (11)an immunostimulant and a particle of metal salt [122]; (12) a saponinand an oil-in-water emulsion [123]; (13) a saponin (e.g.QS21)+3dMPL+IL-12 (optionally+a sterol) [124]; (14) aluminum salts,preferably hydroxide or phosphate, but any other suitable salt may alsobe used (e.g. hydroxyphosphate, oxyhydroxide, orthophosphate, sulphateetc. [chapters 8 & 9 of ref. 111]). Mixtures of different aluminum saltsmay also be used. The salt may take any suitable form (e.g. gel,crystalline, amorphous etc.); (15) chitosan; (16) cholera toxin or E.coli heat labile toxin, or detoxified mutants thereof [125]; (17)microparticles of poly(α-hydroxy)acids, such as PLG; (18) othersubstances that act as immunostimulating agents to enhance the efficacyof the composition. Aluminum salts and/or MF59™ are preferred.

The composition is preferably sterile and/or pyrogen-free. It willtypically be buffered around pH 7.

The composition is preferably an immunogenic composition and is morepreferably a vaccine composition. The composition can be used to raiseantibodies in a mammal (e.g. a human).

Vaccines of the invention may be prophylactic (i.e. to prevent disease)or therapeutic (i.e. to reduce or eliminate the symptoms of a disease).

Efficacy can be tested by monitoring expression of polynucleotidesand/or polypeptides of the invention after administration of thecomposition of the invention.

F—Screening Methods and Drug Design

The invention provides methods of screening for compounds with activityagainst cancer, comprising: contacting a test compound with a tissuesample derived from a cell in which HML-2 expression is up-regulated; ora cell line; and monitoring HML-2 expression in the sample. A decreasein expression indicates potential anti-cancer efficacy of the testcompound.

The invention also provides methods of screening for compounds withactivity against prostate cancer, comprising: contacting a test compoundwith a polynucleotide or polypeptide of the invention; and detecting abinding interaction between the test compound and thepolynucleotide/polypeptide. A binding interaction indicates potentialanti-cancer efficacy of the test compound.

The invention also provides methods of screening for compounds withactivity against prostate cancer, comprising: contacting a test compoundwith a polypeptide of the invention; and assaying the function of thepolypeptide. Inhibition of the polypeptide's function (e.g. loss ofprotease activity, loss of RNA export, loss of reverse transcriptaseactivity, loss of endonuclease activity, loss of integrase activityetc.) indicates potential anti-cancer efficacy of the test compound.

Typical test compounds include, but are not restricted to, peptides,peptoids, proteins, lipids, metals, nucleotides, nucleosides, smallorganic molecules, antibiotics, polyamines, and combinations andderivatives thereof. Small organic molecules have a molecular weight ofmore than 50 and less than about 2,500 daltons, and most preferablybetween about 300 and about 800 daltons. Complex mixtures of substances,such as extracts containing natural products, or the products of mixedcombinatorial syntheses, can also be tested and the component that bindsto the target RNA can be purified from the mixture in a subsequent step.

Test compounds may be derived from large libraries of synthetic ornatural compounds. For instance, synthetic compound libraries arecommercially available from Maybridge Chemical Co. (Trevillet, Cornwall,UK) or Aldrich (Milwaukee, Wis.). Alternatively, libraries of naturalcompounds in the form of bacterial, fungal, plant and animal extractsmay be used. Additionally, test compounds may be synthetically producedusing combinatorial chemistry either as individual compounds or asmixtures.

Agonists or antagonists of the polypeptides of the invention can bescreened using any available method known in the art, such as signaltransduction, antibody binding, receptor binding, mitogenic assays,chemotaxis assays, etc. The assay conditions ideally should resemble theconditions under which the native activity is exhibited in vivo, thatis, under physiologic pH, temperature, and ionic strength. Suitableagonists or antagonists will exhibit strong inhibition or enhancement ofthe native activity at concentrations that do not cause toxic sideeffects in the subject. Agonists or antagonists that compete for bindingto the native polypeptide can require concentrations equal to or greaterthan the native concentration, while inhibitors capable of bindingirreversibly to the polypeptide can be added in concentrations on theorder of the native concentration.

Such screening and experimentation can lead to identification of anagonist or antagonist of a HML-2 polypeptide. Such agonists andantagonists can be used to modulate, enhance, or inhibit HML-2expression and/or function. [126]

The present invention relates to methods of using the polypeptides ofthe invention (e.g. recombinantly produced HML-2 polypeptides) to screencompounds for their ability to bind or otherwise modulate, such as,inhibit, the activity of HML-2 polypeptides, and thus to identifycompounds that can serve, for example, as agonists or antagonists of theHML-2 polypeptides. In one screening assay, the HML-2 polypeptide isincubated with cells susceptible to the growth stimulatory activity ofHML-2, in the presence and absence of a test compound. The HML-2activity altering or binding potential of the test compound is measured.Growth of the cells is then determined. A reduction in cell growth inthe test sample indicates that the test compound binds to and therebyinactivates the HML-2 polypeptide, or otherwise inhibits the HML-2polypeptide activity.

Transgenic animals (e.g. rodents) that have been transformed toover-express HML-2 genes can be used to screen compounds in vivo for theability to inhibit development of tumors resulting from HML-2over-expression or to treat such tumors once developed. Transgenicanimals that have prostate tumors of increased invasive or malignantpotential can be used to screen compounds, including antibodies orpeptides, for their ability to inhibit the effect of HML-2 polypeptides.Such animals can be produced, for example, as described in the examplesherein.

Screening procedures such as those described above are useful foridentifying agents for their potential use in pharmacologicalintervention strategies in prostate cancer treatment. Additionally,polynucleotide sequences corresponding to HML-2, including LTRs, may beused to assay for inhibitors of elevated gene expression.

Potent inhibitors of HERV-K protease are already known [127]. Inhibitionof HERV-K protease by HIV-1 protease inhibitors has also been reported[128]. These compounds can be studied for use in prostate cancertherapy, and are also useful lead compounds for drug design.

Transdominant negative mutants of cORF have also been reported[129,130]. Transdominant cORF mutants can be studied for use in prostatecancer therapy.

Antisense oligonucleotides complementary to HML-2 mRNA can be used toselectively diminish or oblate the expression of the polypeptide. Morespecifically, antisense constructs or antisense oligonucleotides can beused to inhibit the production of HML-2 polypeptide(s) in prostate tumorcells. Antisense mRNA can be produced by transfecting into target cancercells an expression vector with a HML-2 polynucleotide of the inventionoriented in an antisense direction relative to the direction ofPCAV-mRNA transcription. Appropriate vectors include viral vectors,including retroviral vectors, as well as non-viral vectors. Alternately,antisense oligonucleotides can be introduced directly into target cellsto achieve the same goal. Oligonucleotides can be selected/designed toachieve the highest level of specificity and, for example, to bind to aPCAV-mRNA at the initiator ATG.

Monoclonal antibodies to HML-2 polypeptides can be used to block theaction of the polypeptides and thereby control growth of cancer cells.This can be accomplished by infusion of antibodies that bind to HML-2polypeptides and block their action.

The invention also provides high-throughput screening methods foridentifying compounds that bind to a polynucleotide or polypeptide ofthe invention. Preferably, all the biochemical steps for this assay areperformed in a single solution in, for instance, a test tube ormicrotitre plate, and the test compounds are analyzed initially at asingle compound concentration. for the purposes of high throughputscreening, the experimental conditions are adjusted to achieve aproportion of test compounds identified as “positive” compounds fromamongst the total compounds screened. The assay is preferably set toidentify compounds with an appreciable affinity towards the target e.g.,when 0.1% to 1% of the total test compounds from a large compoundlibrary are shown to bind to a given target with a K_(i) of 10 μM orless (e.g. 1 μM, 100 nM, 10 nM, or less)

G—The HML-2 Family of Human Endogenous Retroviruses

Genomes of all eukaryotes contain multiple copies of sequences relatedto infectious retroviruses. These endogenous retroviruses have been wellstudied in mice where both true infectious forms and thousands ofdefective retrovirus-like elements (e.g. the TAP and Etn sequencefamilies) exist. Some members of the IAP and Etn families are “active”retrotransposons since insertions of these elements have been documentedwhich cause germ line mutations or oncogenic transformation.

Endogenous retroviruses were identified in human genomic DNA by theirhomology to retroviruses of other vertebrates [131, 132]. It is believedthat the human genome probably contains numerous copies of endogenousproviral DNAs, but little is known about their function. Most HERVfamilies have relatively few members (1-50) but one family (HERV-H)consists of ˜1000 copies per haploid genome distributed on allchromosomes. The large numbers and general transcriptional activity ofHERVs in embryonic and tumor cell lines suggest that they could act asdisease-causing insertional mutagens or affect adjacent gene expressionin a neutral or beneficial way.

The K family of human endogenous retroviruses (HERV-K) is well known[133]. It is related to the mouse mammary tumor virus (MMTV) and ispresent in the genomes of humans, apes and old world monkeys, butseveral human HERV-K proviruses are unique to humans [134]. The HERV-Kfamily is present at 30-50 full-length copies per haploid human genomeand possesses long open reading frames that potentially are translatedinto viral proteins [135, 136]. Two types of proviral genomes are known,which differ by the presence (type 2) or absence (type 1) of a stretchof 292 nucleotides in the overlapping boundary of the pol and env genes[137]. Some members of the HERV-K family are known to code for the gagprotein and retroviral particles, which are both detectable in germ celltumors and derived cell lines [138]. Analysis of the RNA expressionpattern of full-length HERV-K has also identified a doubly-spliced RNAthat encodes a 105 amino acid protein termed central ORF (‘cORF’) whichis a sequence-specific nuclear RNA export factor that is functionallyequivalent to the Rev protein of HW [139]. HERV-K10 has been shown toencode a full-length gag homologous 73 kDa protein and a functionalprotease [140].

Patients suffering from germ cell tumors show high antibody titersagainst HERV-K gag and env proteins at the time of tumor detection[141]. In normal testis and testicular tumors the HERV-K transmembraneenvelope protein has been detected both in germ cells and tumor cells,but not in the surrounding tissue. In the case of testicular tumor,correlations between the expression of the env-specific mRNA, thepresence of the transmembrane env, cORF and gag proteins and antibodiesagainst HERV-K specific peptides in the serum of the patients, have beenreported. Reference 142 reports that HERV-K10 gag and/or env proteinsare synthesized in seminoma cells and that patients with those tumorsexhibit relatively high antibody titers against gag and/or env.

Gag proteins released in form of particles from HERV-K have beenidentified in the cell culture supernatant of the teratocarcinomaderived cell line Tera 1. These retrovirus-like particles (termed “humanteratocarcinoma derived virus” or HTDV) have been shown to have a 90%sequence homology to the HERV-K10 genome [138, 143].

While the HERV-K family is present in the genome of every human cell, ahigh level of expression of mRNAs, proteins and particles is observedonly in human teratocarcinoma cell lines [144]. In other tissues andcell lines, only a basal level of expression of mRNA has beendemonstrated even using very sensitive methods. The expression ofretroviral proviruses is generally regulated by elements of the 5′ longterminal repeat (LTR). Furthermore, the activation of expression of anendogenous retrovirus may trigger the expression of a downstream genethat triggers a neoplastic effect.

The sequence of HERV-K(II), which locates to chromosome 3, has beendisclosed [145].

HML-2 is a subgroup of the HERV-K family [146]. HERV isolates which aremembers of the HML-2 subgroup include HERV-K10 [137,142], the 27 HML-2viruses shown in FIG. 4 of reference 147, HERV-K(C7) [148], HERV-K(II)[145], HERV-K(CH) Table 11 provides a list of all known members of theHML-2 subgroup of the HERV-K family as determined by searching theDoubleTwist database containing all genomic contigs with the sequenceAF074086 using the Smith-Waterman algorithm with the default parameters:open gap penalty=−20 and extension penalty=−5.

The invention is based on the finding that HML-2 mRNA expression isup-regulated in prostate tumors. Because HML-2 is a well-recognizedfamily, the skilled person will be able to determine without difficultywhether any particular endogenous retroviruses is or is not a HML-2.Preferred members of the HML-2 family for use in accordance with thepresent invention are those whose proviral genome has an LTR which hasat least 75% sequence identity to SEQ ID NO:150 (the LTR sequence fromHML-2.HOM [1]). Example LTRs include SEQ ID NOS:151-154.

H—HERV-K(CH)

The present invention is based on the discovery of elevated levels ofmultiple HML-2 polynucleotides in prostate tumor samples as compared tonormal prostate tissue. One particular HML-2 whose mRNA was found to beup-regulated is designated herein as ‘HERV-K(CH)’.

Sequences from HERV-K(CH) are shown in SEQ ID NOS:14-39 and have beendeposited with the ATCC (see Table 7). The skilled person will be ableto classify any further HERV as HERV-K(CH) or not based on sequenceidentity to these HERV-K(CH) polynucleotides. Preferably such acomparison is to one or more, or all, of the polynucleotide sequencesdisclosed herein or of the polynucleotide inserts in the ATCC-depositedisolates. Alternatively, the skilled artisan can determine the sequenceidentity based on a comparison to any one or more, or all, of thesequences in SEQ ID NOS:7-10 and SEQ ID NOS:14-39 taking intoconsideration the spontaneous mutation rate associated with retroviralreplication. Thus, it will be apparent when the differences in thesequences are consistent with a HERV-K(CH) isolate or consistent withanother HERV.

HERV-K(CH) is therefore a specific member of the HML-2 subgroup whichcan be used in the invention as described above. It can also be used inmethods previously described in relation to HERV-K e.g. the diagnosis oftesticular cancer [142], autoimmune diseases, multiple sclerosis [149],insulin-dependent diabetes mellitus (IDDM) [150] etc.

H.1—HERV-K(CH) Nucleic Acids

H.1.1-HERV-K(CH) Genomic Sequences

The invention provides an isolated polynucleotide comprising: (a) thenucleotide sequence of any of SEQ ID NOS:7-10; (b) the nucleotidesequence of any of SEQ ID NOS:27-39; (c) the complement of a nucleotidesequence of any of SEQ ID NOS:7-10; or (d) the complement of thenucleotide sequence of any of SEQ ID NOS:27-39.0

H.1.2—HERV-K(CH) Fragments

The invention also provides an isolated polynucleotide comprising afragment of: (a) a nucleotide sequence shown in SEQ ID NOS:7-10; (b) thenucleotide sequence shown in any of SEQ ID NOS:27-39; (c) the complementof a nucleotide sequence shown in SEQ ID NOS:7-10; or (d) the complementof the nucleotide sequence shown in any of SEQ ID NOS:27-39.

The fragment is preferably at least x nucleotides in length, wherein xis at least 7 (e.g. at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 75, 80, 90, 100etc.). The value of x may be between about 150 and about 200 or bebetween about 250 and about 300. The value of x may be about 350, about400, about 450, about 500, about 550, about 600, about 650, about 700,or about 750. The value of x may be less than 2000 (e.g. less than 1000,500, 100, or 50).

The fragment is preferably neither one of the following sequences nor afragment of one of the following sequences: (i) the nucleotide sequenceshown in SEQ ID NO:42; (ii) the nucleotide sequence shown in SEQ IDNO:43; (iii) the nucleotide sequence shown in SEQ ID NO:44; (iv) thenucleotide sequence shown in SEQ ID NO:45; (v) a known polynucleotide;or (vi) a polynucleotide known as of 7 Dec. 2000 (e.g. a polynucleotideavailable in a public database such as GenBank of GeneSeq before 7 Dec.2000).

The fragment is preferably a contiguous sequence of one ofpolynucleotides of (a), (b), (c) or (d) that remains unmasked followingapplication of a masking program for masking low complexity (e.g.XBLAST) to the sequence (i.e. one would select an unmasked region, asindicated by the polynucleotides outside the poly-n stretches of themasked sequence produced by the masking program).

These polynucleotides are particularly useful as probes. In general, aprobe in which x=15 represents sufficient sequence for uniqueidentification. Probes can be used, for example, to determine thepresence or absence of a polynucleotide of the invention (or variantsthereof) in a sample. By using probes, particularly labeled probes ofDNA sequences, one can isolate homologous or related genes. The sourceof homologous genes can be any species e.g. primate species,particularly human; rodents, such as rats and mice; canines; felines;bovines; ovines; equines; yeast; nematodes; etc.

Probes from more than one polynucleotide sequence of the invention canhybridize with the same nucleic acid if the nucleic acid from which theywere derived corresponds to a single sequence (e.g. more than one canhybridize to a single cDNA derived from the same mRNA).

Preferred fragments (e.g. for the identification of HERV-K(CH)polynucleotides associated with cancer) which do not correspondidentically in their entirety to any portion of the sequence(s) shown inSEQ ID NOS:42-45 are: SEQ ID NO:59 (from gag region), SEQ ID NOS:60-70(from pol region) and SEQ ID NOS:71-82 (from 3′ pol region).

Preferred fragments (e.g. for the simultaneous identification ofHERV-K(CH) polynucleotides, HERV-KII polynucleotides and/or HERV-K10polynucleotides) which do correspond identically in their entirety toany portion of the sequence(s) shown in SEQ ID NOS:44 & 45 are SEQ IDNOS:83 & 84 (from gag region).

Polynucleotide probes unique to HERV-K(CH), HERV-KII and HERV-K10 gagregions are provided in Table 1; polynucleotide probes unique toHERV-K(CH), HERV-KII, and HERV-K10 protease 3′ and polymerase 5′ regionsare provided in Table 2; polynucleotide probes unique to HERV-K(CH),HERV-KII, and HERV-K10 3′ pol only regions are provided in Table 3.

H.1.3—HERV-K(CH) Fragments Plus Heterologous Sequences

The invention also provides an isolated polynucleotide comprising (a) asegment that is a fragment of the sequence shown in SEQ ID NOS:7-10 orSEQ ID NOS:27-39, wherein (i) said fragment is at least 10 nucleotidesin length and (ii) corresponds identically in its entirety to a portionof SEQ ID NO:44 and/or 45; and, optionally, (b) one or more segmentsflanking the segment defined in (a), wherein the presence of saidoptional segment(s) causes said polynucleotide to not correspondidentically to any portion of a sequence shown in SEQ ID NOS:7-10 or SEQID NOS:27-39. In some embodiments, the optional flanking segments shareless than 40% sequence identity to the nucleic acid sequences shown inSEQ ID NOS:7-10, SEQ ID NO:44 and/or SEQ ID NO:45. In other embodiments,the optional flanking segments have no contiguous sequence of 10, 12, 15or 20 nucleotides in common with SEQ ID NOS:7-10, SEQ ID NO:44 and/orSEQ ID NO:45. In yet other embodiments, the optional flanking segment isnot present. In further embodiments, a fragment of the polynucleotidesequence is up to at least 30, 40, 50, 60, 70, 80, 90, 100, 200, 300,400, 500, 1000, or 1500 nucleotides in length.

The invention also provides an isolated polynucleotide having formula5′-A-B-C-3′, wherein: A is a nucleotide sequence consisting of anucleotides; B is a nucleotide sequence consisting of a fragment of bnucleotides from (i) the nucleotide sequence shown in SEQ ID NOS:7-10,(ii) the nucleotide sequence shown in any of SEQ ID NOS:27-39, (iii) thecomplement of the nucleotide sequence shown in SEQ ID NOS:7-10, or (iv)the complement of the nucleotide sequence shown in any of SEQ IDNOS:27-39; C is a nucleotide sequence consisting of c nucleotides; andwherein said polynucleotide is not a fragment of (i) the nucleotidesequence shown in SEQ ID NOS:7-10, (ii) the nucleotide sequence shown inany of SEQ ID NOS:27-39, (iii) the complement of the nucleotide sequenceshown in SEQ ID NOS:7-10, or (iv) the complement of the nucleotidesequence shown in any of SEQ ID NOS:27-39.

In this polynucleotide, a+c is at least 1 (e.g. at least 2, 3, 4, 5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 etc.) and b isat least 7 (e.g. at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 etc.).It is preferred that the value of a+b+c is at least 9 (e.g. at least 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40,45, 50, 60, 70, 80, 90, 100 etc.). It is preferred that the value ofa+b+c is at most 200 (e.g. at most 190, 180, 170, 160, 150, 140, 130,120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 25, 20, 19, 18, 17, 16, 15,14, 13, 12, 11, 10, 9).

A and/or C may comprise a promoter sequence (or its complement).

H.1.4—Homologous Sequences

The invention provides a polynucleotide having at least s % identity to:(a) SEQ ID NOS:7-10; (b) a fragment of x nucleotides of SEQ ID NOS:7-10;(c) SEQ ID NOS:11-13; (b) a fragment of x nucleotides of SEQ IDNOS:11-13. The value of s is at least 50 (e.g. at least 55, 60, 65, 70,75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 99.5, 99.9 etc.).The value of x is at least 7 (e.g. 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100etc.).

These polynucleotides include naturally-occurring variants (e.g.degenerate variants, allelic variants, etc.), homologs, orthologs, andfunctional mutants.

Variants can be identified by hybridization of putative variants withthe polynucleotide sequences disclosed in SEQ ID NOS:14-39 herein,preferably by hybridization under stringent conditions. For example, byusing appropriate wash conditions, variants can be identified where theallelic variant exhibits at most about 25-30% base pair (bp) mismatchesrelative to the selected polynucleotide probe. In general, allelicvariants contain 15-25% bp mismatches, and can contain as little as even5-15%, or 2-5%, or 1-2% bp mismatches, as well as a single bp mismatch.

The invention also encompasses homologs corresponding to any one of thepolynucleotide sequences provided herein, where the source of homologousgenes can be any mammalian species (e.g. primate species, particularlyhuman; rodents, such as rats, etc.). Between mammalian species (e.g.human and primate), homologs generally have substantial sequencesimilarity (e.g. at least 75% sequence identity, usually at least 90%,more usually at least 95%) between nucleotide sequences. Sequencesimilarity is calculated based on a reference sequence, which may be asubset of a larger sequence, such as a conserved motif, coding region,flanking region, domain, etc. A reference sequence will usually be atleast about 18 contiguous nt long, more usually at least about 30 ntlong, and may extend to the complete sequence that is being compared.Algorithms for sequence analysis are known in the art.

A preferred HERV-K(CH) isolate is an isolate sequence which is shown inSEQ ID NOS:7-10. Another preferred class of HERV-K(CH) isolates arethose having a nucleotide sequence identity of at least 90%, preferablyat least 95% to the 3′ polymerase region shown in SEQ ID NO:13 whichrelates to integrase, as measured by the alignment program GCG Gap(Suite Version 10.1) using the default parameters: open gap=3 and extendgap=1. Another preferred class of HERV-K(CH) isolates are those having anucleotide sequence identity of at least 98%, more preferably at least99% to the 5′ polymerase region shown in SEQ ID NO:12 which relates toreverse transcriptase, as measured by the alignment program GCG Gap(Suite Version 10.1) using the default parameters: open gap=3 and extendgap=1. Another typical classification of the relationship ofretroviruses is based on the amino acid sequence similarities in thereverse transcriptase protein. Thus, an even more preferred class ofHERV-K(CH) isolates are those having an amino acid sequence identity ofat least 90%, more preferably 95% to the 5′ polymerase region encoded bythe nucleotide sequence shown in SEQ ID NO:12, as determined by theSmith-Waterman homology search algorithm using an affine gap search witha gap open penalty of 12 and a gap extension penalty of 2, BLOSUM matrixof 62. Thus, these prostate cancer-associated polynucleotide sequencesdefine a class of human endogenous retroviruses, designated herein asHERV-K(CH), whose members comprise variations which, without wanted tobe bound by theory, may be due to the presence of polymorphisms orallelic variations.

H.1.5—HERV-K(CH) Hybridizable Sequences

The invention provides an isolated polynucleotide comprising apolynucleotide that selectively hybridizes, relative to a knownpolynucleotide, to: (a) the nucleotide sequence shown in SEQ IDNOS:7-10; (b) the nucleotide sequence shown in any of SEQ ID NOS:27-39;(c) the complement of the nucleotide sequence shown in SEQ ID NOS:7-10;(d) the complement of the nucleotide sequence shown in any of SEQ IDNOS:27-39; (e) a fragment of the nucleotide sequence shown in SEQ IDNOS:7-10; (f) a fragment of the nucleotide sequence shown in any of SEQID NOS:27-39; (g) the complement of a fragment of the nucleotidesequence shown in SEQ r ID NOS:7-10; (h) the complement of a fragment ofthe nucleotide sequence shown in any of SEQ ID NOS:27-39; (j) anucleotide sequence shown in SEQ ID NOS:14-39; or (k) polynucleotidesfound in ATCC deposits having ATCC accession numbers given in Table 7.The fragment of (e), (f), (g) or (h) is preferably at least xnucleotides in length, wherein x is as defined in H.1.2 above, and ispreferably not one of the sequences (i), (ii), (iii), (iv), (v) or (vi)as defined H.1.2 above.

Hybridization reactions can be performed under conditions of different“stringency”, as described in B.4 above. In some embodiments, thepolynucleotide hybridizes under low stringency conditions; in otherembodiments it hybridizes under intermediate stringency conditions; inother embodiments, it hybridizes under high stringency conditions.

H.1.6—Deposited HERV-K Sequences

The invention also provides an isolated polynucleotide comprising: (a) aHERV-K(CH) cDNA insert as deposited at the ATCC and having an ATCCaccession number given in Table 7; (b) a HERV-K(CH) sequence as shown inany one of SEQ ID NOS:14-26; (c) a HERV-K(CH) sequence as shown in anyone of SEQ ID NOS:27-39; or (d) a fragment of (a), (b) or (c). Thefragment of (d) is preferably at least x nucleotides in length, whereinx is at least 7 (e.g. at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100etc.).

H.1.7—Preferred HERV-K(CH) Sequences

Preferred polynucleotides of the invention are those having a sequenceset forth in any one of the polynucleotide sequences SEQ ID NOS:7-10 andSEQ ID NOS:14-39 provided herein; polynucleotides obtained from thebiological materials described herein, in particular, polynucleotidesequences present in the isolates deposited with the ATCC and havingATCC accession numbers given in Table 7 or other biological sources(particularly human sources) or by hybridization to the above mentionedsequences under stringent conditions (particularly conditions of highstringency); genes corresponding to the provided polynucleotides;variants of the provided polynucleotides and their corresponding genesparticularly those variants that retain a biological activity of theencoded gene product (e.g. a biological activity ascribed to a geneproduct corresponding to the provided polynucleotides as a result of theassignment of the gene product to a protein family(ies) and/oridentification of a functional domain present in the gene product).Other polynucleotides and polynucleotide compositions contemplated byand within the scope of the present invention will be readily apparentto one of ordinary skill in the art when provided with the disclosurehere.

H.1.8—General Features of Polynucleotides of the Invention

General features of the polynucleotides described in this section H.1are the same as those described in section B.4 above.

The isolated polynucleotides preferably comprise a polynucleotide havinga HERV-K(CH) sequence.

A polynucleotide of the invention can encode all or a part of apolypeptide, such as the gag region, 5′ pol region or 3′ pol region of ahuman endogenous retrovirus. Double or single stranded fragments can beobtained from the DNA sequence by chemically synthesizingoligonucleotides in accordance with conventional methods, by restrictionenzyme digestion, by PCR amplification, etc.

Polynucleotides of the invention can be cDNAs or genomic DNAs, as wellas fragments thereof, particularly fragments that encode a biologicallyactive gene product and/or are useful in the methods disclosed herein(e.g. in diagnosis, as a unique identifier of a differentially expressedgene of interest, etc.). The term “cDNA” as used herein is intended toinclude all nucleic acids that share the arrangement of sequenceelements found in native mature mRNA species, where sequence elementsare exons and 3′ and 5′ non-coding regions. Normally mRNA species havecontiguous exons, with the intervening introns, when present, beingremoved by nuclear RNA splicing, to create a continuous open readingframe encoding a polypeptide. mRNA species can also exist with bothexons and introns, where the introns may be removed by alternativesplicing. Furthermore it should be noted that different species of mRNAsencoded by the same genomic sequence can exist at varying levels in acell, and detection of these various levels of mRNA species can beindicative of differential expression of the encoded gene product in thecell.

A genomic sequence of interest comprises the nucleic acid presentbetween the initiation codon and the stop codon, as defined in thelisted sequences, including all of the introns that are normally presentin a native chromosome. It can further include the 3′ and 5′untranslated regions found in the mature mRNA. It can further includespecific transcriptional and translational regulatory sequences, such aspromoters, enhancers, etc., including about 1 kb, but possibly more, offlanking genomic DNA at either the 5′ and 3′ end of the transcribedregion. The genomic DNA can be isolated as a fragment of 100 kbp orsmaller; and substantially free of flanking chromosomal sequence. Thegenomic DNA flanking the coding region, either 3′ and 5′, or internalregulatory sequences as sometimes found in introns, contains sequencesrequired for proper tissue, stage-specific, or disease-state specificexpression.

Polynucleotides of the invention can be provided as linear molecules orwithin circular molecules, and can be provided within autonomouslyreplicating molecules (vectors) or within molecules without replicationsequences. Expression of the polynucleotides can be regulated by theirown or by other regulatory sequences known in the art. Thepolynucleotides can be introduced into suitable host cells using avariety of techniques available in the art, such as transferrinpolycation-mediated DNA transfer, transfection with naked orencapsulated nucleic acids, liposome-mediated DNA transfer,intracellular transportation of DNA-coated latex beads, protoplastfusion, viral infection, electroporation, gene gun, calciumphosphate-mediated transfection, and the like.

A polynucleotide sequence that is “shown in” or “depicted in” a SEQ IDNO or Figure means that the sequence is present as an identicalcontiguous sequence in the SEQ ID NO or Figure. The term encompassesportions, or regions of the SEQ ID NO or Figure as well as the entiresequence contained within the SEQ ID NO or Figure.

H.2—HERV-K(CH) Polypeptides

H.2.1—HERV-K(CH) Open Reading Frames

The invention provides an isolated polypeptide: (a) encoded within aHERV-K(CH) open reading frame; (b) encoded by a polynucleotide shown inSEQ ID NO:11, 12 or 13; or (c) comprising an amino acid sequence asshown in any one of SEQ ID NOS:46-49, 50-55, 56-57 or 58.

Deduced polypeptides encoded by the HERV-K(CH) polynucleotides of theinvention include the gag translations shown in SEQ IDS 46-49 and the 3′pol translations shown in SEQ ID NOS:50-55. A polypeptide sequenceencoded by the polynucleotide having the sequence shown in SEQ ID NO:15is provided in SEQ ID NO:56; a polypeptide sequence encoded by thepolynucleotide having the sequence shown in SEQ ID NO:14, is shown inSEQ ID NO:57. A consensus 3′ pol polypeptide sequence encoded by thepolynucleotides having the sequence shown in SEQ ID NOS:21-27,inclusive, is provided in SEQ ID NO:58.

The polypeptides encompassed by the present invention include thoseencoded by polynucleotides of the invention, e.g. SEQ ID NOS:7-10 andSEQ ID NOS:14-39, as well as polynucleotides deposited with the ATCC asdisclosed herein, as well as nucleic acids that, by virtue of thedegeneracy of the genetic code, are not identical in sequence to thedisclosed polynucleotides and encode the polypeptides. Thus, theinvention includes within its scope a polypeptide encoded by apolynucleotide having the sequence of any one of the polynucleotidesequences provided herein, or a variant thereof.

While the over-expression of the polynucleotides associated withprostate tumor is observed, elevated levels of expression of thepolypeptides encoded by these polynucleotides may likely play a role inprostate tumors.

Typically, in retroviruses, a single large gag polypeptide issynthesized (e.g. a 73 kDa gag protein in HERV-K10) which issubsequently cleaved into multiple functional peptides by a functionalprotease encoded by the pol or protease region of the genome.Overexpression of sequences corresponding to both gag and pol domains ofthe HERV-K(CH) suggest such a mechanism. Sequences corresponding to theenv and the nuclear RNA transport protein cORF region of the HERV-K(CH)genome may also be overexpressed. The polypeptides encoded by the openreading frames within the over-expressed polynucleotide sequences mayplay a significant role in the progression of prostate tumors.

The detection of these polypeptides by antibodies or other reagents thatspecifically recognize them may aid in the early diagnosis of prostatetumor or any other cancers associated with the overexpression of theseHERV-K(CH) sequences.

Furthermore, inhibition of the function of these polypeptides maysuggest means for therapy and treatment of prostatic or other HERV-K(CH)sequence related cancers. One method of accomplishing such inhibition isby administration of vaccines as a preventative therapy orantibody-mediated drug therapy as a post-neoplasia regimen for treatmentof such cancers.

H.2.2—HERV-K(CH) Fragments

The invention provides an isolated polypeptide comprising a fragment of:(a) a polypeptide sequence encoded within a HERV-K(CH) open readingframe; (b) a polypeptide sequence encoded by a polynucleotide shown inSEQ ID NO:11, 12 or 13; or (c) an amino acid sequence as shown in anyone of SEQ ID NOS:46-49, 50-55, 56-57 or 58.

The fragment is preferably at least x amino acids in length, wherein xis at least 5 (e.g. at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 75, 80, 90,100, 125, 150, 200, 300, 400, 500 or more etc.). The value ofx willtypically not exceed 1000.

The fragment may include an epitope e.g. an epitope of the amino acidsequence shown in SEQ ID NOS:56, 57 or 58.

SEQ ID NOS:46-49 provide a translation of the HERV-K(CH) polynucleotideshaving a sequence shown in SEQ ID NOS:14, 15, 16 and 40 (the sequence ofSEQ ID NO:40 is from a polynucleotide found in a normal prostatelibrary) corresponding to polynucleotides encoding the gag region. SEQID NOS:50-55 provide a translation of the HERV-K(CH) polynucleotideshaving a sequence shown in SEQ ID NOS:21-26, inclusive, corresponding tothe 3′ region of pol. SEQ ID NOS:56 & 57 provide translations of theHERV-K(CH) polynucleotide of SEQ ID NO:15 and SEQ ID NO:14,respectively. SEQ ID NO:58 provides a consensus translation of thepolynucleotide from the 3′ pol region (SEQ ID NOS:21-26, inclusive).Encompassed with the present invention are polypeptide fragments, suchas, epitopes, of at least 5 amino acids, at least 6 amino acids, atleast 8 amino acids, at least 10 amino acids, at least 11 amino acids,at least 12 amino acids, at least 13 amino acids, at least 14 aminoacids and at least 15 amino acids of the translations shown in SEQ IDNOS:46-49 and 50-55. In a preferred embodiment, the HERV-K(CH) epitopesof the amino acid sequence as shown in SEQ ID NOS:56-58 were determinedby the Jameson-Wolf antigenic index [21].

The following regions in 3′ pol (SEQ ID NO:58) were determined to beantigenic by Jameson-Wolf algorithm: amino acids: 1-10; 15-35; 45-55;60-85; 100-115; 125-140; 170-190; 195-215; 230-268. Additionalepitope-containing fragments include amino acids 1-8; 2-10; 1-15; 5-15;7-15; 10-20; 12-20; 15-23; 20-28; 28-35; 15-30; 15-40; 20-30; 45-52;48-55; 60-68; 60-70; 65-73; 70-78; 75-83; 70-80; 65-75; 68-75; 75-85;78-85; 65-85; 60-75; 100-108; 103-110; 105-113; 108-115; 125-133;128-135; 132-140; 170-178; 175-182; 180-187; 182-190; 195-202; 200-208;205-212; 208-215; 230-237; 235-242; 240-247; 245-252; 250-257; 255-262;260-268; 230-250; 235-255; 240-260; 245-268; 230-245; 235-245; 235-250;240-255; 245-260; 250-268; 15-55; 170-215; 45-85.

The following regions in gag (SEQ ID NO:56) were determined to beantigenic by Jameson-Wolf algorithm: amino acids: 1-40; 45-60; 80-105;130-145; 147-183; 186-220; 245-253; 255-288. Additionalepitope-containing fragments include amino acids 1-8; 2-10; 1-15; 5-15;7-15; 10-20; 12-20; 15-23; 20-28; 28-35; 30-37; 33-40; 1-20; 20-40;1-15; 15-30; 15-40; 45-52; 50-57; 55-62; 50-60; 1-60; 80-87; 85-92;80-90; 90-97; 95-102; 98-105; 85-100; 90-105; 80-100; 85-105; 130-137;135-142; 140-147; 145-152; 150-157; 155-162; 160-167; 165-172; 170-177;175-183; 180-187; 185-192; 190-197; 195-202; 200-207; 205-212; 210-217;213-220; 185-220; 190-220; 195-220; 200-220; 205-220; 255-262; 260-267;265-272; 270-277; 275-282; 280-288; 245-288; 250-288; 260-288; 265-288;270-288.

The following regions in gag (SEQ ID NO:57) were determined to beantigenic by Jameson-Wolf algorithm: amino acids: 1-40; 80-105; 145-180;185-225; 240-335. Additional epitope-containing fragments include aminoacids 1-8; 2-10; 1-15; 5-15; 7-15; 10-20; 12-20; 15-23; 20-28; 28-35;30-37; 33-40; 1-20; 20-40; 1-15; 15-30; 15-40; 80-87; 85-92; 80-90;90-97; 95-102; 98-1-05; 85-100; 90-105; 80-100; 85-105; 145-152;150-157; 155-162; 160-167; 165-172; 170-177; 175-182; 180-187; 185-192;190-197; 195-202; 200-207; 205-212; 210-217; 215-212; 218-225; 145-160;150-165; 155-170; 160-175; 170-185; 180-225; 185-225; 190-225; 195-225;200-225; 205-225; 210-225; 215-225; 240-247; 245-252; 250-257; 255-262;260-267; 265-272; 270-277; 275-282; 280-287; 285-292; 290-297; 295-302;300-307; 305-312; 310-317; 315-322; 320-327; 325-332; 328-335; 245-285;250-285; 260-285; 265-285; 270-295; 275-300; 280-305; 285-310; 295-315;300-320; 305-325; 325-335; 245-335; 250-335; 255-335; 260-335; 270-335;275-335; 280-335; 285-335; 290-335; 295-335; 305-335; 310-335; 315-335;320-335.

H.2.3—HERV-K(CH) Fragments Plus Heterologous Sequences

The invention also provides an isolated polypeptide having formula5′-A-B-C-3′, wherein: A is an amino acid sequence consisting of a aminoacids; B is an amino acid sequence consisting of a fragment of b aminoacids from (i) the amino acid sequence encoded by a polynucleotide shownin SEQ ID NO:11, 12 or 13; (ii) any one of SEQ ID NOS:46-49, 50-55,56-57 or 58; C is an amino acid sequence consisting of c amino acids;and wherein said polypeptide is not a fragment of the amino acidsequence defined in (i) or (ii).

In this polypeptide, a+c is at least 1 (e.g. at least 2, 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 etc.) and b isat least 7 (e.g. at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 etc.).It is preferred that the value of a+b+c is at least 9 (e.g. at least 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40,45, 50, 60, 70, 80, 90, 100 etc.). It is preferred that the value ofa+b+c is at most 200 (e.g. at most 190, 180, 170, 160, 150, 140, 130,120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 25, 20, 19, 18, 17, 16, 15,14, 13, 12, 11, 10, 9).

H.2.4—Homologous Sequences

The invention provides a polypeptide having at least s % identity to:(a) the polypeptide sequences encoded by SEQ ID NOS:7-45; (b) a fragmentof x amino acids of the polypeptide sequences encoded by SEQ IDNOS:7-45; (c) the polypeptide sequences SEQ ID NOS:46-58; (d) a fragmentof x amino acids of the polypeptide sequences SEQ ID NOS:46-58. Thevalue of s is at least 35 (e.g. at least 40, 45, 50, 55, 60, 65, 70, 75,80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,98, 99, 99.5, 99.9 etc.). The value of x is at least 7 (e.g. 8, 9, 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40,45, 50, 60, 70, 80, 90, 100.

These polypeptides include naturally-occurring variants (e.g. allelicvariants, etc.), homologs, orthologs, and functional mutants.

The invention thus encompasses variants of the naturally-occurringpolypeptides, wherein such variants are homologous or substantiallysimilar to the naturally occurring polypeptide, and can be of an originof the same or different species as the naturally occurring polypeptide(e.g. human, murine, or some other species that naturally expresses therecited polypeptide, usually a mammalian species). These polypeptidevariants are encoded by polynucleotides that are within the scope of theinvention, and the genetic code can be used to select appropriate codonsto construct the corresponding variants.

H.2.5—Preferred HERV-K(CH) Sequences

The invention provides polypeptides, such as those shown in SEQ IDNOS:46-58, encoded by HERV-K(CH) polynucleotides that are differentiallyexpressed in prostate cancer cells. Such polypeptides are referred toherein as “polypeptides associated with prostate cancer” or “HERV-K(CH)polypeptides”. The polypeptides can be used to generate antibodiesspecific for a polypeptide associated with prostate cancer, whichantibodies are in turn useful in diagnostic methods, prognostic methods,therametric methods, and the like as discussed in more detail herein.Polypeptides are also useful as targets for therapeutic intervention, asdiscussed in more detail herein.

Preferred polypeptides are encoded by polynucleotides of the invention.

H.2.6—General Features of Polypeptides of the Invention

General features of the polypeptides described in this section H.2 arethe same as those described in section C.3 above.

The isolated polypeptides of the invention preferably comprise apolypeptide having a HERV-K(CH) sequence.

Polypeptides, such as polypeptides of the gag regions or polypeptides ofthe pol regions, encoded by the polynucleotides disclosed herein, suchas polynucleotides having the sequences as shown in SEQ ID NOS:7-10 andSEQ ID NOS:14-39, and in isolates deposited with the ATCC and havingATCC accession numbers given in Table 7 and/or their corresponding fulllength genes, can be used to screen peptide libraries to identifybinding partners, such as receptors, from among the encodedpolypeptides. Peptide libraries can be synthesized according to methodsknown in the art (e.g. see refs. 151 & 152).

In general, the term “polypeptide” as used herein refers to both thefull length polypeptide encoded by the recited polynucleotide, thepolypeptide encoded by the gene represented by the recitedpolynucleotide, as well as portions or fragments thereof.

A polypeptide sequence that is “shown in” or “depicted in” a SEQ ID NOor Figure means that the sequence is present as an identical contiguoussequence in the SEQ ID NO or Figure. The term encompasses portions, orregions of the SEQ ID NO or Figure as well as the entire sequencecontained within the SEQ ID NO or Figure.

H.3—Anti-HERV-K(CH) Antibodies

The present invention also provides isolated antibodies or antigenbinding fragments thereof, that bind to a polypeptide of the presentinvention. The present invention also provides isolated antibodies orantigen binding fragments thereof, that bind to a polypeptide encoded bya polynucleotide of the present invention. The present invention alsoprovides isolated antibodies that bind to a polypeptide of theinvention, or antigen binding fragment thereof, encoded by apolynucleotide made by the method comprising the following steps i)immunizing a host animal with a composition comprising said polypeptideof the present invention, or antigen binding fragment thereof, and ii)collecting cells from said host expressing antibodies against theantigen or antigen binding fragment thereof. The present invention alsoprovides isolated antibodies that bind to a polypeptide, or antigenbinding fragment thereof, encoded by a polynucleotide of the presentinvention made by the method comprising the following steps: providing acell line producing an antibody, wherein said antibody binds to apolypeptide of the present invention, or antigen binding fragmentthereof, encoded by a polynucleotide of the present invention andculturing said cell line under conditions wherein said antibodies areproduced. In additional embodiments, the antibodies are collected andmonoclonal antibodies are produced using the collected host cells orgenetic material derived from the collected host cells. In additionalembodiments, the antibody is a polyclonal antibody. In a furtherembodiment, the antibody is attached to a solid surface or furthercomprises a detectable label.

The present invention further provides antibodies, which may be isolatedantibodies, that bind a polypeptide encoded by a polynucleotidedescribed herein. Antibodies can be provided in a composition comprisingthe antibody and a buffer and/or a pharmaceutically acceptableexcipient. Antibodies specific for a polypeptide associated with cancerare useful in a variety of diagnostic and therapeutic methods, asdiscussed in detail herein.

Expression products of a polynucleotide described herein, as well as thecorresponding mRNA (particularly mRNAs having distinct secondary and/ortertiary structures), cDNA, or complete gene, or fragments of saidexpression products can be prepared and used for raising antibodies forexperimental, diagnostic, and therapeutic purposes. For polynucleotidesto which a corresponding gene has not been assigned, this provides anadditional method of identifying the corresponding gene. Thepolynucleotide or related cDNA is expressed as described above, andantibodies are prepared. These antibodies are specific to an epitope onthe polypeptide encoded by the polynucleotide, and can precipitate orbind to the corresponding native polypeptide in a cell or tissuepreparation or in a cell-free extract of an in vitro expression system.

Polyclonal or monoclonal antibodies to the HERV-K(CH) polypeptides or anepitope thereof can be made for use in immunoassays by any of a numberof methods known in the art. By epitope reference is made to anantigenic determinant of a polypeptide. The presence of an epitope isdemonstrated by the ability of an antibody to bind a polypeptide withspecificity. Two antibodies are considered to be directed to the sameepitope if they cross block each others binding to the same polypeptide.

One approach for preparing antibodies to a polypeptide is the selectionand preparation of an amino acid sequence of all or part of thepolypeptide, chemically synthesizing the sequence and injecting it intoan appropriate animal, typically a rabbit, hamster or a mouse.

Oligopeptides can be selected as candidates for the production of anantibody to the HERV-K(CH) polypeptide based upon the oligopeptideslying in hydrophilic regions, which are thus likely to be exposed in themature polypeptide. Additional oligopeptides can be determined using,for example, the Antigenicity Index [30].

In other embodiments of the present invention, humanized monoclonalantibodies are provided, wherein the antibodies are specific forHERV-K(CH) polypeptides and do not appreciably bind other HERVpolypeptides. The phrase “humanized antibody” refers to an antibodyderived from a non-human antibody, typically a mouse monoclonalantibody. Alternatively, a humanized antibody may be derived from achimeric antibody that retains or substantially retains theantigen-binding properties of the parental, non-human, antibody butwhich exhibits diminished immunogenicity in humans as compared to theparental antibody. The phrase “chimeric antibody,” as used herein,refers to an antibody containing sequence derived from two differentantibodies (see, e.g. ref. 153) which typically originate from differentspecies. Most typically, chimeric antibodies comprise human and murineantibody fragments, generally human constant and mouse variable regions.

In the present invention, HERV-K(CH) polypeptides of the invention andvariants thereof are used to immunize a transgenic animal as describedabove. Monoclonal antibodies are made using methods known in the art,and the specificity of the antibodies is tested using isolatedHERV-K(CH) polypeptides.

Methods for preparation of the human or primate HERV-K(CH) or an epitopethereof include, but are not limited to chemical synthesis, recombinantDNA techniques or isolation from biological samples. Chemical synthesisof a peptide can be performed, for example, by the classical Merrifeldmethod of solid phase peptide synthesis [154] or the FMOC strategy on aRapid Automated Multiple Peptide Synthesis system (E. I. du Pont deNemours Company, Wilmington, Del.) [155].

Polyclonal antibodies can be prepared by immunizing rabbits or otheranimals by injecting antigen followed by subsequent boosts atappropriate intervals. The animals are bled and sera assayed againstpurified HERV-K(CH) usually by ELISA or by bioassay based upon theability to block the action of HERV-K(CH). When using avian species,e.g. chicken, turkey and the like, the antibody can be isolated from theyolk of the egg. Monoclonal antibodies can be prepared after the methodof Milstein and Kohler by fusing splenocytes from immunized mice withcontinuously replicating tumor cells such as myeloma or lymphoma cells.[156, 157, 158]. The hybridoma cells so formed are then cloned bylimiting dilution methods and supernates assayed for antibody productionby ELISA, RIA or bioassay.

The unique ability of antibodies to recognize and specifically bind totarget polypeptides provides an approach for treating an overexpressionof the polypeptide. Thus, another aspect of the present inventionprovides for a method for preventing or treating diseases involvingoverexpression of a HERV-K(CH) polypeptide by treatment of a patientwith specific antibodies to the HERV-K(CH) polypeptide.

Specific antibodies, either polyclonal or monoclonal, to the HERV-K(CH)polypeptides can be produced by any suitable method known in the art asdiscussed above. For example, murine or human monoclonal antibodies canbe produced by hybridoma technology or, alternatively, the HERV-K(CH)polypeptides, or an immunologically active fragment thereof, or ananti-idiotypic antibody, or fragment thereof can be administered to ananimal to elicit the production of antibodies capable of recognizing andbinding to the HERV-K(CH) polypeptides. Such antibodies can be from anyclass of antibodies including, but not limited to IgG, IgA, IgM, IgD,and IgE or in the case of avian species, IgY and from any subclass ofantibodies.

H.4—HER V-K(CH) Vectors and Host Cells

The present invention also encompasses vectors and host cells comprisingan isolated polynucleotide of the present invention.

H.5—HERV-K(CH) Kits, Libraries and Arrays

The invention provides kits, electronic libraries and arrays comprisingpolynucleotides of the invention, for use in diagnosing the presence ofcancer in a test sample.

In general, a library of polynucleotides is a collection of sequenceinformation, which information is provided in either biochemical form(e.g. as a collection of polynucleotide molecules), or in electronicform (e.g. as a collection of polynucleotide sequences stored in acomputer-readable form, as in a computer system and/or as part of acomputer program). The sequence information of the polynucleotides canbe used in a variety of ways, e.g. as a resource for gene discovery, asa representation of sequences expressed in a selected cell type (e.g.cell type markers), and/or as markers of a given disease or diseasestate. In general, a disease marker is a representation of a geneproduct that is present in all cells affected by disease either at anincreased or decreased level relative to a normal cell (e.g. a cell ofthe same or similar type that is not substantially affected by disease).For example, a polynucleotide sequence in a library can be apolynucleotide that represents an mRNA, polypeptide, or other geneproduct encoded by the polynucleotide, that is either over-expressed orunder-expressed in a tissue affected by cancer, such as prostate cancerrelative to a normal (i.e. substantially disease-free) tissue, such asnormal prostate tissue.

The nucleotide sequence information of the library can be embodied inany suitable form, e.g. electronic or biochemical forms. For example, alibrary of sequence information embodied in electronic form comprises anaccessible computer data file (or, in biochemical form, a collection ofnucleic acid molecules) that contains the representative nucleotidesequences of genes that are differentially expressed (e.g.over-expressed or under-expressed) as between, for example, i) acancerous cell and a normal cell; ii) a cancerous cell and a dysplasticcell; iii) a cancerous cell and a cell affected by a disease orcondition other than cancer; iv) a metastatic cancerous cell and anormal cell and/or non-metastatic cancerous cell; v) a malignantcancerous cell and a non-malignant cancerous cell (or a normal cell)and/or vi) a dysplastic cell relative to a normal cell. Othercombinations and comparisons of cells affected by various diseases orstages of disease will be readily apparent to the ordinarily skilledartisan. Biochemical embodiments of the library include a collection ofnucleic acids that have the sequences of the genes in the library, wherethe nucleic acids can correspond to the entire gene in the library or toa fragment thereof, as described in greater detail below.

The polynucleotide libraries of the subject invention generally comprisesequence information of a plurality of polynucleotide sequences, whereat least one of the polynucleotides has a sequence of any of sequencedescribed herein. By plurality is meant at least 2, usually at least 3and can include up to all of the sequences described herein. The lengthand number of polynucleotides in the library will vary with the natureof the library, e.g. if the library is an oligonucleotide array, a cDNAarray, a computer database of the sequence information, etc.

Where the library is an electronic library, the nucleic acid sequenceinformation can be present in a variety of media. “Media” refers to amanufacture, other than an isolated nucleic acid molecule, that containsthe sequence information of the present invention. Such a manufactureprovides the genome sequence or a subset thereof in a form that can beexamined by means not directly applicable to the sequence as it existsin a nucleic acid. For example, the nucleotide sequence of the presentinvention, e.g. the nucleic acid sequences of any of the polynucleotidesof the sequences described herein, can be recorded on computer readablemedia, e.g. any medium that can be read and accessed directly by acomputer. Such media include, but are not limited to: magnetic storagemedia, such as a floppy disc, a hard disc storage medium, and a magnetictape; optical storage media such as CD-ROM; electrical storage mediasuch as RAM and ROM; and hybrids of these categories such asmagnetic/optical storage media. One of skill in the art can readilyappreciate how any of the presently known computer readable mediums canbe used to create a manufacture comprising a recording of the presentsequence information. “Recorded” refers to a process for storinginformation on computer readable medium, using any such methods as knownin the art. Any convenient data storage structure can be chosen, basedon the means used to access the stored information. A variety of dataprocessor programs and formats can be used for storage, e.g. wordprocessing text file, database format, etc. In addition to the sequenceinformation, electronic versions of libraries comprising one or moresequence described herein can be provided in conjunction or connectionwith other computer-readable information and/or other types ofcomputer-readable files (e.g. searchable files, executable files, etc,including, but not limited to, for example, search program software,etc.).

By providing the nucleotide sequence in computer readable form, theinformation can be accessed for a variety of purposes. Computer softwareto access sequence information is publicly available. For example, thegapped BLAST [159] and BLAZE [160] search algorithms on a Sybase systemcan be used to identify open reading frames (ORFs) within the genomethat contain homology to ORFS from other organisms.

As used herein, “a computer-based system” refers to the hardware means,software means, and data storage means used to analyze the nucleotidesequence information of the present invention. The minimum hardware ofthe computer-based systems of the present invention comprises a centralprocessing unit (CPU), input means, output means, and data storagemeans. A skilled artisan can readily appreciate that any one of thecurrently available computer-based system are suitable for use in thepresent invention. The data storage means can comprise any manufacturecomprising a recording of the present sequence information as describedabove, or a memory access means that can access such a manufacture.

“Search means” refers to one or more programs implemented on thecomputer-based system, to compare a target sequence or target structuralmotif, or expression, levels of a polynucleotide in a sample, with thestored sequence information. Search means can be used to identifyfragments or regions of the genome that match a particular targetsequence or target motif. A variety of known algorithms are publiclyknown and commercially available, e.g. MacPattern (EMBL), BLASTN andBLASTX (NCBI). A “target sequence” can be any polynucleotide or aminoacid sequence of six or more contiguous nucleotides or two or more aminoacids, preferably from about 10 to 100 amino acids or from about 30 to300 nt A variety of comparing means can be used to accomplish comparisonof sequence information from a sample (e.g. to analyze target sequences,target motifs, or relative expression levels) with the data storagemeans. A skilled artisan can readily recognize that any one of thepublicly available homology search programs can be used as the searchmeans for the computer based systems of the present invention toaccomplish comparison of target sequences and motifs. Computer programsto analyze expression levels in a sample and in controls are also knownin the art.

A “target structural motif,” or “target motif,” refers to any rationallyselected sequence or combination of sequences in which the sequence(s)are chosen based on a three-dimensional configuration that is formedupon the folding of the target motif, or on consensus sequences ofregulatory or active sites. There are a variety of target motifs knownin the art. Protein target motifs include, but are not limited to,enzyme active sites and signal sequences. Nucleic acid target motifsinclude, but are not limited to, hairpin structures, promoter sequencesand other expression elements such as binding sites for transcriptionfactors.

A variety of structural formats for the input and output means can beused to input and output the information in the computer-based systemsof the present invention. One format for an output means ranks therelative expression levels of different polynucleotides. Suchpresentation provides a skilled artisan with a ranking of relativeexpression levels to determine a gene expression profile.

As discussed above, the “library” as used herein also encompassesbiochemical libraries of the polynucleotides of the sequences describedherein, e.g. collections of nucleic acids representing the providedpolynucleotides. The biochemical libraries can take a variety of forms,e.g. a solution of cDNAs, a pattern of probe nucleic acids stablyassociated with a surface of a solid support (i.e. an array) and thelike. Of particular interest are nucleic acid arrays in which one ormore of the genes described herein is represented by a sequence on thearray. By array is meant an article of manufacture that has at least asubstrate with at least two distinct nucleic acid targets on one of itssurfaces, where the number of distinct nucleic acids can be considerablyhigher, typically being at least 10 nt, usually at least 20 nt and oftenat least 25 nt. A variety of different array formats have been developedand are known to those of skill in the art. The arrays of the subjectinvention find use in a variety of applications, including geneexpression analysis, drug screening, mutation analysis and the like, asdisclosed in the above-listed exemplary patent documents.

In addition to the above nucleic acid libraries, analogous libraries ofpolypeptides are also provided, where the where the polypeptides of thelibrary will represent at least a portion of the polypeptides encoded bya gene corresponding to a sequence described herein.

Polynucleotide arrays provide a high throughput technique that can assaya large number of polynucleotides or polypeptides in a sample. Thistechnology can be used as a tool to test for differential expression. Avariety of methods of producing arrays, as well as variations of thesemethods, are known in the art and contemplated for use in the invention.For example, arrays can be created by spotting polynucleotide probesonto a substrate (e.g. glass, nitrocellulose, etc.) in a two-dimensionalmatrix or array having bound probes. The probes can be bound to thesubstrate by either covalent bonds or by non-specific interactions, suchas hydrophobic interactions. Samples of polynucleotides can bedetectably labeled (e.g. using radioactive or fluorescent labels) andthen hybridized to the probes. Double stranded polynucleotides,comprising the labeled sample polynucleotides bound to probepolynucleotides, can be detected once the unbound portion of the sampleis washed away. Alternatively, the polynucleotides of the test samplecan be immobilized on the array, and the probes detectably labeled.Techniques for constructing arrays and methods of using these arrays aredescribed in, for example, references 161 to 177.

Arrays can be used to, for example, examine differential expression ofgenes and can be used to determine gene function. For example, arrayscan be used to detect differential expression of a gene corresponding toa polynucleotide described herein, where expression is compared betweena test cell and control cell (e.g. cancer cells and normal cells). Forexample, high expression of a particular message in a cancer cell, whichis not observed in a corresponding normal cell, can indicate a cancerspecific gene product. Exemplary uses of arrays are further describedin, for example, references 178 and 179. Furthermore, many variations onmethods of detection using arrays are well within the skill in the artand within the scope of the present invention. For example, rather thanimmobilizing the probe to a solid support, the test sample can beimmobilized on a solid support which is then contacted with the probe.

A gene or polynucleotide that is differentially expressed in a cancercell when the polynucleotide is detected at higher or lower levels incancer compared with a cell of the same cell type that is not cancerous.Typically, screening for polynucleotides differentially expressedfocuses on a polynucleotide that is expressed such that, for example,mRNA is found at levels at least about 25%, at least about 50% to about75%, at least about 90%, preferably at least about 2-fold, morepreferably at least about 5-fold, at least about 10-fold, or at leastabout 50-fold or more, higher (e.g. overexpressed) or lower (e.g.underexpressed) in a cancer cell when compared with a cell of the samecell type that is not cancerous. The comparison can be made between twotissues, for example, if one is using in situ hybridization or anotherassay method that allows some degree of discrimination among cell typesin the tissue. The comparison may also be made between cells removedfrom their tissue source. Thus, a polypeptide encoded by apolynucleotide that is differentially expressed in a cancer cell wouldbe of clinical significance with respect to cancer.

In one preferred embodiment of the present invention, an array comprisesat least two polynucleotides, each having a sequence selected from thegroup consisting of SEQ ID NOS:14-39 and polynucleotides present inisolates deposited with the ATCC and having ATCC accession numbersPTA-2561, PTA-2572, PTA-2566, PTA-2571, PTA-2562, PTA-2573, PTA-2560,PTA-2565, PTA-2568, PTA-2564, PTA-2569, PTA-2567, PTA-2559, PTA-2563,PTA-2570. In another preferred embodiment, an array comprises at leastone polynucleotide having a sequence selected from the group consistingof SEQ ID NOS:14-39 and polynucleotides present in isolates depositedwith the ATCC and having ATCC accession numbers PTA-2561, PTA-2572,PTA-2566, PTA-2571, PTA-2562, PTA-2573, PTA-2560, PTA-2565, PTA-2568,PTA-2564, PTA-2569, PTA-2567, PTA-2559, PTA-2563, PTA-2570 and at leastone of a polynucleotide having a sequence shown in SEQ ID NO:42 or 43.

The polynucleotides described herein, as well as their gene products,are of particular interest as genetic or biochemical markers (e.g. inblood or tissues) that will detect the earliest changes along thecarcinogenesis pathway and/or to monitor the efficacy of varioustherapies and preventive interventions. For example, the level ofexpression of certain polynucleotides can be indicative of a poorerprognosis, and therefore warrant more aggressive chemo- or radio-therapyfor a patient or vice versa. The correlation of novel surrogate tumorspecific features with response to treatment and outcome in patients candefine prognostic indicators that allow the design of tailored therapybased on the molecular profile of the tumor. These therapies includeantibody targeting, antagonists (e.g. small molecules), and genetherapy. Determining expression of certain polynucleotides andcomparison of a patients profile with known expression in normal tissueand variants of the disease allows a determination of the best possibletreatment for a patient, both in terms of specificity of treatment andin terms of comfort level of the patient. Polynucleotide expression canalso be used to better classify, and thus diagnose and treat, differentforms and disease states of cancer. Two classifications widely used inoncology that can benefit from identification of the expression levelsof the genes corresponding to the polynucleotides described herein arestaging of the cancerous disorder, and grading the nature of thecancerous tissue.

The polynucleotides that correspond to differentially expressed genes,as well as their encoded gene products, can be useful to monitorpatients having or susceptible to cancer to detect potentially malignantevents at a molecular level before they are detectable at a grossmorphological level. In addition, the polynucleotides described herein,as well as the genes corresponding to such polynucleotides, can beuseful as therametrics, e.g. to assess the effectiveness of therapy byusing the polynucleotides or their encoded gene products, to assess, forexample, tumor burden in the patient before, during, and after therapy.

Furthermore, a polynucleotide identified as corresponding to a gene thatis differentially expressed in, and thus is important for, one type ofcancer can also have implications for development or risk of developmentof other types of cancer, e.g. where a polynucleotide represents a genedifferentially expressed across various cancer types.

In another embodiment, the diagnostic and/or prognostic methods of theinvention involve detection of expression of a selected set of genes ina test sample to produce a test expression pattern (TEP). The TEP iscompared to a reference expression pattern (REP), which is generated bydetection of expression of the selected set of genes in a referencesample (e.g. a positive or negative control sample). The selected set ofgenes includes at least one of the genes of the invention, which genescorrespond to the polynucleotide sequences described herein. Ofparticular interest is a selected set of genes that includes genedifferentially expressed in the disease for which the test sample is tobe screened.

“Reference sequences” or “reference polynucleotides” as used herein inthe context of differential gene expression analysis anddiagnosis/prognosis refers to a selected set of polynucleotides, whichselected set includes at least one or more of the differentiallyexpressed polynucleotides described herein. A plurality of referencesequences, preferably comprising positive and negative controlsequences, can be included as reference sequences. Additional suitablereference sequences are found in GenBank, Unigene, and other nucleotidesequence databases (including, e.g. expressed sequence tag (EST),partial, and full-length sequences).

“Reference array” means an array having reference sequences for use inhybridization with a sample, where the reference sequences include all,at least one of, or any subset of the differentially expressedpolynucleotides described herein. Usually such an array will include atleast 2 different reference sequences, and can include any one or all ofthe provided differentially expressed sequences. Arrays of interest canfurther comprise sequences, including polymorphisms, of other geneticsequences, particularly other sequences of interest for screening for adisease or disorder (e.g. cancer, dysplasia, or other related orunrelated diseases, disorders, or conditions). The oligonucleotidesequence on the array will usually be at least about 12 nt in length,and can be of about the length of the provided sequences, or can extendinto the flanking regions to generate fragments of 100 nt to 200 nt inlength or more. Reference arrays can be produced according to anysuitable methods known in the art. For example, methods of producinglarge arrays of oligonucleotides are described in references 180 & 181using light-directed synthesis techniques. Using a computer controlledsystem, a heterogeneous array of monomers is converted, throughsimultaneous coupling at a number of reaction sites, into aheterogeneous array of polymers. Alternatively, microarrays aregenerated by deposition of pre-synthesized oligonucleotides onto a solidsubstrate, for example as described in reference 182.

A “reference expression pattern” or “REP” as used herein refers to therelative levels of expression of a selected set of genes, particularlyof differentially expressed genes, that is associated with a selectedcell type, e.g. a normal cell, a cancerous cell, a cell exposed to anenvironmental stimulus, and the like. A “test expression pattern” or“TEP” refers to relative levels of expression of a selected set ofgenes, particularly of differentially expressed genes, in a test sample(e.g. a cell of unknown or suspected disease state, from which mRNA isisolated).

REPs can be generated in a variety of ways according to methods wellknown in the art. For example, REPs can be generated by hybridizing acontrol sample to an array having a selected set of polynucleotides(particularly a selected set of differentially expressedpolynucleotides), acquiring the hybridization data from the array, andstoring the data in a format that allows for ready comparison of the REPwith a TEP. Alternatively, all expressed sequences in a control samplecan be isolated and sequenced, e.g. by isolating mRNA from a controlsample, converting the mRNA into cDNA, and sequencing the cDNA. Theresulting sequence information roughly or precisely reflects theidentity and relative number of expressed sequences in the sample. Thesequence information can then be stored in a format (e.g. acomputer-readable format) that allows for ready comparison of the REPwith a TEP. The REP can be normalized prior to or after data storage,and/or can be processed to selectively remove sequences of expressedgenes that are of less interest or that might complicate analysis (e.g.some or all of the sequences associated with housekeeping genes can beeliminated from REP data).

TEPs can be generated in a manner similar to REPs, e.g. by hybridizing atest sample to an array having a selected set of polynucleotides,particularly a selected set of differentially expressed polynucleotides,acquiring the hybridization data from the array, and storing the data ina format that allows for ready comparison of the TEP with a REP. The REPand TEP to be used in a comparison can be generated simultaneously, orthe TEP can be compared to previously generated and stored REPs.

In one embodiment of the invention, comparison of a TEP with a REPinvolves hybridizing a test sample with an array, where the referencearray has one or more reference sequences for use in hybridization witha sample. The reference sequences include all, at least one of, or anysubset of the differentially expressed polynucleotides described herein.Hybridization data for the test sample is acquired, the data normalized,and the produced TEP compared with a REP generated using an array havingthe same or similar selected set of differentially expressedpolynucleotides. Probes that correspond to sequences differentiallyexpressed between the two samples will show decreased or increasedhybridization efficiency for one of the samples relative to the other.

Methods for collection of data from hybridization of samples with areference arrays are well known in the art. For example, thepolynucleotides of the reference and test samples can be generated usinga detectable fluorescent label, and hybridization of the polynucleotidesin the samples detected by scanning the microarrays for the presence ofthe detectable label using, for example, a microscope and light sourcefor directing light at a substrate. A photon counter detectsfluorescence from the substrate, while an x-y translation stage variesthe location of the substrate. A confocal detection device that can beused in the subject methods is described in reference 183. A scanninglaser microscope is described in reference 163. A scan, using theappropriate excitation line, is performed for each fluorophore used. Thedigital images generated from the scan are then combined for subsequentanalysis. For any particular array element, the ratio of the fluorescentsignal from one sample (e.g. a test sample) is compared to thefluorescent signal from another sample (e.g. a reference sample), andthe relative signal intensity determined.

Methods for analyzing the data collected from hybridization to arraysare well known in the art. For example, where detection of hybridizationinvolves a fluorescent label, data analysis can include the steps ofdetermining fluorescent intensity as a function of substrate positionfrom the data collected, removing outliers, i.e. data deviating from apredetermined statistical distribution, and calculating the relativebinding affinity of the targets from the remaining data. The resultingdata can be displayed as an image with the intensity in each regionvarying according to the binding affinity between targets and probes.

In general, the test sample is classified as having a gene expressionprofile corresponding to that associated with a disease or non-diseasestate by comparing the TEP generated from the test sample to one or moreREPs generated from reference samples (e.g. from samples associated withcancer or specific stages of cancer, dysplasia, samples affected by adisease other than cancer, normal samples, etc.). The criteria for amatch or a substantial match between a TEP and a REP include expressionof the same or substantially the same set of reference genes, as well asexpression of these reference genes at substantially the same levels(e.g. no significant difference between the samples for a signalassociated with a selected reference sequence after normalization of thesamples, or at least no greater than about 25% to about 40% differencein signal strength for a given reference sequence. In general, a patternmatch between a TEP and a REP includes a match in expression, preferablya match in qualitative or quantitative expression level, of at least oneof, all or any subset of the differentially expressed genes of theinvention.

Pattern matching can be performed manually, or can be performed using acomputer program. Methods for preparation of substrate matrices (e.g.arrays), design of oligonucleotides for use with such matrices, labelingof probes, hybridization conditions, scanning of hybridized matrices,and analysis of patterns generated, including comparison analysis, aredescribed e.g. in reference 184.

H.6—HERV-K(CH)-Based Diagnostic Methods

The invention provides methods for diagnosing the presence of cancer ina test sample associated with expression of a polynucleotide in a testcell sample, comprising the steps of: i) detecting a level of expressionof at least one polynucleotide of the invention, or a fragment thereof,or at least one polynucleotide found in an isolate selected from thegroup consisting of ATCC accession numbers given in Table 7, or afragment thereof; and ii) comparing said level of expression of thepolynucleotide in the test sample with a level of expression ofpolynucleotide in the control cell sample, wherein differentialexpression of the polynucleotide in the test cell sample relative to thelevel of polynucleotide expression in the control cell sample isindicative of the presence of cancer in the test cell sample.

In some embodiments of the present invention, the cancer is prostatecancer. In other embodiments of the present invention, the cancer istesticular cancer.

In yet other embodiments of the present invention, the detecting ismeasuring the level of an RNA transcript; measuring the level of apolynucleotide; or measuring by a method including PCR, TMA, bDNA, NATor Nasba. In further embodiments, the polynucleotide is attached to asolid support.

The present invention also provides compositions comprising a test cellsample and an isolated polynucleotide of the present invention. Thepresent invention further provides methods for detecting cancerassociated with expression of a polypeptide in a test cell sample,comprising the steps of: i) detecting a level of expression of at leastone polypeptide of the invention, or a fragment thereof and ii)comparing said level of expression of the polypeptide in the test samplewith a level of expression of polypeptide in the control cell sample,wherein an altered level of expression of the polypeptide in the testcell sample relative to the level of expression of the polypeptide inthe control cell sample is indicative of the presence of cancer in thetest cell sample. The present invention also provides methods fordetecting cancer associated with the presence of an antibody in a testcell sample, comprising the steps of i) detecting a level of an antibodyof the present invention, and ii) comparing said level of said antibodyin the test sample with a level of said antibody in the control cellsample, wherein an altered level of antibody in said test cell samplerelative to the level of antibody in the control cell sample isindicative of the presence of cancer in the test cell sample. In someembodiments, the cancer is prostate cancer and in other embodiments, thecancer is testicular cancer.

This invention also provides methods for detecting cancer associatedwith elevated levels of HERV-K(CH) polynucleotides, in particular inprostate cancer, by means of (i) detecting polynucleotides having atleast 65%, at least 70%, at least 75%, at least 80%, at least 85%, atleast 90% at least 91%, at least 92%, at least 93%, at least 94%, atleast 95%, at least 96%, at least 97%, at least 98%, at least 99% or atleast 100% identity to the polynucleotide shown in SEQ ID NOS:7-10 or topolynucleotides in isolates deposited with the ATCC and having ATCCdeposit accession numbers PTA-2561, PTA-2572, PTA-2566, PTA-2571,PTA-2562, PTA-2573, PTA-2560, PTA-2565, PTA-2568, PTA-2564, PTA-2569,PTA-2567, PTA-2559, PTA-2563, PTA-2570, as measured by the alignmentprogram GCG Gap (Suite Version 10.1) using the default parameters: opengap=3 and extend gap=1 or polynucleotides hybridizing under highstringency conditions to the polynucleotide shown in SEQ ID NOS:7-10;(ii) detecting polypeptides, or fragments thereof encoded by thesequences of (i); and (iii) detecting antibodies specific for one ormore of the polypeptides. Furthermore, (iv) detecting particlesassociated with overexpression of HERV-K(CH) polynucleotides may also beused in the diagnosis of cancer, in particular, prostate cancer, andmonitoring its progression.

The treatment regimen of a prostate or other cancer associated withelevated levels of HERV-K(CH) polynucleotides may also monitored bydetecting levels of the polynucleotides and polypeptides in order toassess the staging of the cancer and/or efficacy of particular cancertherapies.

The present invention provides methods of using the polynucleotidesdescribed herein for detecting cancer cells, in particular prostatecancer cells, facilitating diagnosis of cancer and the severity of acancer (e.g. tumor grade, tumor burden, and the like) in a subject,facilitating a determination of the prognosis of a subject, andassessing the responsiveness of the subject to therapy (e.g. byproviding a measure of therapeutic effect through, for example,assessing tumor burden during or following a chemotherapeutic regimen).Detection can be based on detection of a polynucleotide that isdifferentially expressed in a cancer cell, and/or detection of apolypeptide encoded by a polynucleotide that is differentially expressedin a cancer cell. The detection methods of the invention can beconducted in vitro or in vivo, on isolated cells, or in whole tissues ora bodily fluid e.g. blood, plasma, serum, urine, and the like).

The detection methods can be provided as part of a kit. Thus, theinvention further provides kits for detecting the presence and/or alevel of a polynucleotide that is differentially expressed in a cancercell (e.g. by detection of an mRNA encoded by the differentiallyexpressed gene of interest), and/or a polypeptide encoded thereby, in abiological sample. Procedures using these kits can be performed byclinical laboratories, experimental laboratories, medical practitioners,or private individuals. The kits of the invention for detecting apolypeptide encoded by a polynucleotide that is differentially expressedin a cancer cell may comprise a moiety that specifically binds thepolypeptide, which may be an antibody that binds the polypeptide orfragment thereof. The kits of the invention used for detecting apolynucleotide that is differentially expressed in a prostate cancercell may comprise a moiety that specifically hybridizes to such apolynucleotide. The kit may optionally provide additional componentsthat are useful in the procedure, including, but not limited to,buffers, developing reagents, labels, reacting surfaces, means fordetection, control samples, standards, instructions, and interpretiveinformation.

Accordingly, the present invention provides kits for detecting prostatecancer comprising at least one of polynucleotides having the sequence asshown in SEQ ID NOS:7-10, SEQ ID NOS:14-39, or fragments thereof, orhaving the sequence found in an isolate deposited with the ATCC andhaving ATCC accession numbers PTA-2561, PTA-2572, PTA-2566, PTA-2571,PTA-2562, PTA-2573, PTA-2560, PTA-2565, PTA-2568, PTA-2564, PTA-2569,PTA-2567, PTA-2559, PTA-2563, PTA-2570 or fragments thereof.

In some embodiments, methods are provided for detecting a polypeptideencoded by a gene differentially expressed in a prostate cancer cell.Any of a variety of known methods can be used for detection, including,but not limited to, immunoassay, using antibody that binds thepolypeptide, e.g. by enzyme-linked immunosorbent assay (ELISA),radioimmunoassay (RIA), and the like; and functional assays for theencoded polypeptide, e.g. binding activity or enzymatic activity.

As will be readily apparent to the ordinarily skilled artisan uponreading the present specification, the detection methods and othermethods described herein can be readily varied. Such variations arewithin the intended scope of the invention. For example, in the abovedetection scheme, the probe for use in detection can be immobilized on asolid support, and the test sample contacted with the immobilized probe.Binding of the test sample to the probe can then be detected in avariety of ways, e.g. by detecting a detectable label bound to the testsample to facilitate detected of test sample-immobilized probecomplexes.

The present invention further provides methods for detecting thepresence of and/or measuring a level of a polypeptide in a biologicalsample, which polypeptide is encoded by a polynucleotide that isdifferentially expressed in a prostate cancer cell, using an antibodyspecific for the encoded polypeptide. The methods generally comprise: a)contacting the sample with an antibody specific for a polypeptideencoded by a polynucleotide that is differentially expressed in aprostate cancer cell; and b) detecting binding between the antibody andmolecules of the sample.

Detection of specific binding of the antibody specific for the encodedprostate cancer-associated polypeptide, when compared to a suitablecontrol is an indication that encoded polypeptide is present in thesample. Suitable controls include a sample known not to contain theencoded polypeptide or known not to contain elevated levels of thepolypeptide; such as normal prostate tissue, and a sample contacted withan antibody not specific for the encoded polypeptide, e.g. ananti-idiotype antibody. A variety of methods to detect specificantibody-antigen interactions are known in the art and can be used inthe method, including, but not limited to, standard immunohistologicalmethods, immunoprecipitation, an enzyme immunoassay, and aradioimmunoassay. In general, the specific antibody will be detectablylabeled, either directly or indirectly. Direct labels includeradioisotopes; enzymes whose products are detectable (e.g. luciferase,β-galactosidase, and the like); fluorescent labels (e.g. fluoresceinisothiocyanate, rhodamine, phycoerythrin, and the like); fluorescenceemitting metals, e.g. ¹⁵²Eu, or others of the lanthanide series,attached to the antibody through metal chelating groups such as EDTA;chemiluminescent compounds, e.g. luminol, isoluminol, acridinium salts,and the like; bioluminescent compounds, e.g. luciferin, aequorin (greenfluorescent protein), and the like. The antibody may be attached(coupled) to an insoluble support, such as a polystyrene plate or abead. Indirect labels include second antibodies specific for antibodiesspecific for the encoded polypeptide (“first specific antibody”),wherein the second antibody is labeled as described above; and membersof specific binding pairs, e.g. biotin-avidin, and the like. Thebiological sample may be brought into contact with and immobilized on asolid support or carrier, such as nitrocellulose, that is capable ofimmobilizing cells, cell particles, or soluble proteins. The support maythen be washed with suitable buffers, followed by contacting with adetectably-labeled first specific antibody. Detection methods are knownin the art and will be chosen as appropriate to the signal emitted bythe detectable label. Detection is generally accomplished in comparisonto suitable controls, and to appropriate standards.

In some embodiments, the methods are adapted for use in vivo, e.g. tolocate or identify sites where cancer cells, such as prostate cancercells, are present.

In some embodiments, methods are provided for detecting a cancer cell bydetecting expression in the cell of a transcript that is differentiallyexpressed in a cancer cell. Any of a variety of known methods can beused for detection, including, but not limited to, detection of atranscript by hybridization with a polynucleotide that hybridizes to apolynucleotide that is differentially expressed in a prostate cancercell; detection of a transcript by a polymerase chain reaction usingspecific oligonucleotide primers; in situ hybridization of a cell usingas a probe a polynucleotide that hybridizes to a gene that isdifferentially expressed in a prostate cancer cell. The methods can beused to detect and/or measure mRNA levels of a gene that isdifferentially expressed in a prostate cancer cell. In some embodiments,the methods comprise: a) contacting a sample with a polynucleotide thatcorresponds to a differentially expressed gene described herein underconditions that allow hybridization; and b) detecting hybridization, ifany.

Detection of differential hybridization, when compared to a suitablecontrol, is an indication of the presence in the sample of apolynucleotide that is differentially expressed in a cancer cell.Appropriate controls include, for example, a sample which is known notto contain a polynucleotide that is differentially expressed in a cancercell, and use of a labeled polynucleotide of the same “sense” as thepolynucleotide that is differentially expressed in the cancer cell. In apreferred embodiment, the cancer cell is a prostate cancer cell.Conditions that allow hybridization are known in the art, and have beendescribed in more detail above. Detection can also be accomplished byany known method, including, but not limited to, in situ hybridization,PCR (polymerase chain reaction), RT-PCR (reverse transcription-PCR),TMA, bDNA, and Nasba and “Northern” or RNA blotting, or combinations ofsuch techniques, using a suitably labeled polynucleotide. A variety oflabels and labeling methods for polynucleotides are known in the art andcan be used in the assay methods of the invention. Specifichybridization can be determined by comparison to appropriate controls.

Polynucleotide generally comprising at least 10 nt, at least 12 nt or atleast 15 contiguous nucleotides of a polynucleotide provided herein,such as, for example, those having the sequence as depicted in SEQ IDNOS:7-10, and 3-28, are used for a variety of purposes, such as probesfor detection of and/or measurement of, transcription levels of apolynucleotide that is differentially expressed in a prostate cancercell. A probe that hybridizes specifically to a polynucleotide disclosedherein should provide a detection signal at least 5-, 10-, or 20-foldhigher than the background hybridization provided with other unrelatedsequences. It should be noted that “probe” as used herein is meant torefer to a polynucleotide sequence used to detect a differentiallyexpressed gene product in a test sample. As will be readily appreciatedby the ordinarily skilled artisan, the probe can be detectably labeledand contacted with, for example, an array comprising immobilizedpolynucleotides obtained from a test sample (e.g. mRNA). Alternatively,the probe can be immobilized on an array and the test sample detectablylabeled. These and other variations of the methods of the invention arewell within the skill in the art and are within the scope of theinvention.

Nucleotide probes are used to detect expression of a gene correspondingto the provided polynucleotide. In Northern blots, mRNA is separatedelectrophoretically and contacted with a probe. A probe is detected ashybridizing to an mRNA species of a particular size. The amount ofhybridization can be quantitated to determine relative amounts ofexpression, for example under a particular condition. Probes are usedfor in situ hybridization to cells to detect expression. Probes can alsobe used in vivo for diagnostic detection of hybridizing sequences.Probes are typically labeled with a radioactive isotope. Other types ofdetectable labels can be used such as chromophores, fluorophores, andenzymes. Other examples of nucleotide hybridization assays are describedin refs. 185 and 186.

PCR is another means for detecting small amounts of target nucleic acids(see, e.g. refs. 187, 188 & 189). Two primer polynucleotides nucleotidesthat hybridize with the target nucleic acids are used to prime thereaction. The primers can be composed of sequence within or 3′ and 5′ tothe HERV-K(CH) polynucleotides disclosed herein. Alternatively, if theprimers are 3′ and 5′ to these polynucleotides, they need not hybridizeto them or the complements. After amplification of the target with athermostable polymerase, the amplified target nucleic acids can bedetected by methods known in the art (e.g. Southern blot). mRNA or cDNAcan also be detected by traditional blotting techniques (e.g. Southernblot, Northern blot, etc.) described in ref. 8 (e.g. without PCRamplification). In general, mRNA or cDNA generated from mRNA using apolymerase enzyme can be purified and separated using gelelectrophoresis, and transferred to a solid support, such asnitrocellulose. The solid support is exposed to a labeled probe, washedto remove any unhybridized probe, and duplexes containing the labeledprobe are detected.

Methods using PCR amplification can be performed on the DNA from asingle cell, although it is convenient to use at least about 10⁵ cells.The use of the polymerase chain reaction is described in ref. 190, and areview of techniques may be found in pages 14.2 to 14.33 of reference 8.A detectable label may be included in the amplification reaction.Suitable detectable labels include fluorochromes, (e.g. fluoresceinisothiocyanate (FITC), rhodamine, Texas Red, phycoerythrin,allophycocyanin, 6-carboxyfluorescein (6-FAM), 6-carboxy-X-rhodamine(ROX), 2′,7′-dimethoxy-4′,5′-dichloro-6-carboxyfluorescein,5-carboxyfluorescein (5-FAM), N,N,N′,N′-tetramethyl-6-carboxyrhodamine(TAMRA), or 6-carboxy-2′,4′,7′,4,7-hexachlorofluorescein (HEX)),radioactive labels, (e.g. ³²P, ³⁵S, ³H, etc.), and the like. The labelmay be a two stage system, where the polynucleotides is conjugated tobiotin, haptens, etc. having a high affinity binding partner, e.g.avidin, specific antibodies, etc., where the binding partner isconjugated to a detectable label. The label may be conjugated to one orboth of the primers. Alternatively, the pool of nucleotides used in theamplification is labeled, so as to incorporate the label into theamplification product.

The present invention further relates to methods of detecting/diagnosinga neoplastic or preneoplastic condition in a mammal (for example, ahuman).

Examples of conditions that can be detected/diagnosed in accordance withthese methods include, but are not limited to prostate cancers.Polynucleotides corresponding to genes that exhibit the appropriateexpression pattern can be used to detect prostate cancer in a subject.Reference 191 reviews markers of cancer.

One detection/diagnostic method comprises: (a) obtaining from a mammal(eg a human) a biological sample, (b) detecting the presence in thesample of a HERV-K(CH) polypeptide and (c) comparing the amount ofproduct present with that in a control sample. In accordance with thismethod, the presence in the sample of elevated levels of a HERV-K(CH)gene product indicates that the subject has a neoplastic orpreneoplastic condition.

The compound is preferably a binding protein, e.g. an antibody,polyclonal or monoclonal, or antigen binding fragment thereof, which canbe labeled with a detectable marker (eg fluorophore, chromophore orisotope, etc). Where appropriate, the compound can be attached to asolid support. Determination of formation of the complex can be effectedby contacting the complex with a further compound (eg an antibody) thatspecifically binds to the first compound (or complex). Like the firstcompound, the further compound can be attached to a solid support and/orcan be labeled with a detectable marker.

The identification of elevated levels of HERV-K(CH) polypeptide inaccordance with the present invention makes possible the identificationof subjects (patients) that are likely to benefit from adjuvant therapy.For example, a biological sample from a post-primary therapy subject(e.g. subject having undergone surgery) can be screened for the presenceof circulating HERV-K(CH) polypeptide, the presence of elevated levelsof the polypeptide, determined by studies of normal populations, beingindicative of residual tumor tissue. Similarly, tissue from the cut siteof a surgically removed tumor can be examined (e.g. byimmunofluorescence), the presence of elevated levels of product(relative to the surrounding tissue) being indicative of incompleteremoval of the tumor. The ability to identify such subjects makes itpossible to tailor therapy to the needs of the particular subject.Subjects undergoing non-surgical therapy (e.g. chemotherapy or radiationtherapy) can also be monitored, the presence in samples from suchsubjects of elevated levels of HERV-K(CH) polypeptide being indicativeof the need for continued treatment. Staging of the disease (forexample, for purposes of optimizing treatment regimens) can also beeffected, for example, by prostate biopsy e.g. with antibody specificfor a HERV-K(CH) polypeptide.

The present invention also relates to a kit that can be used in thedetection of a HERV-K(CH) polypeptide. The kit can comprise a compoundthat specifically binds a HERV-K(CH) polypeptide, such as, for example,binding proteins including antibodies or binding fragments thereof (e.g.F(ab′)₂ fragments) disposed within a container means. The kit canfurther comprise ancillary reagents, for processing the binding assay.

DEFINITIONS

The term “comprising” means “including” as well as “consisting” e.g. acomposition “comprising” X may consist exclusively of X or may includesomething additional e.g. X+Y.

The term “about” in relation to a numerical value x means, for example,x±10%.

The terms “neoplastic cells”, “neoplasia”, “tumor”, “tumor cells”,“cancer” and “cancer cells”, (used interchangeably) refer to cells whichexhibit relatively autonomous growth, so that they exhibit an aberrantgrowth phenotype characterized by a significant loss of control of cellproliferation (i.e. de-regulated cell division). Neoplastic cells can bemalignant or benign and include prostate cancer derived tissue.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a schematic representation of a human endogenous retroviruswith a depiction of the HERV-K(CH) polynucleotides and their positionrelative to the retrovirus.

FIG. 2 is a schematic representation of open reading frames within theHERV-K(HML-2.HOM) (also known as ‘ERVK6’) genome [1].

FIG. 3 shows splicing events described in the prior art [16] for HERV-KmRNAs.

FIG. 4 shows splice sites identified near the 5′ and 3′ ends of the envORF. The three reading frames are shaded differently.

FIG. 5 shows northern blot analysis of PCAV transcripts in cancer celllines. The top arrow on the left shows the position of the genomic mRNAtranscript. The next arrow shows the position of the env transcript. Thebottom two arrows show the positions of other ORFs. The lanes containRNA from the following cell lines: (1) Tera 1; (2) DU145; (3) PC3; (4)MDA Pca-2b; (5) LNCaP. Tera 1 is a teratocarcinoma cell line; the othersare prostatic carcinoma cell lines.

FIG. 6 shows an alignment of env genomic DNA sequences from 27 HERV-Kviruses. A consensus sequence (SEQ ID NO:157) is shown on the bottomline.

FIGS. 7-9 show alignments of inferred polypeptide sequences for gag (7),pol (8) and env (9) from various HERV-K viruses, together with consensussequences (SEQ ID NOS:158-160).

MODES FOR CARRYING OUT THE INVENTION

Certain aspects of the present invention are described in greater detailin the non-limiting examples that follow. The examples are put forth soas to provide those of ordinary skill in the art with a completedisclosure and description of how to make and use the present invention,and are not intended to limit the scope of what the inventors regard astheir invention nor are they intended to represent that the experimentsbelow are all and only experiments performed. Efforts have been made toensure accuracy with respect to numbers used (e.g. amounts, temperature,etc.) but some experimental errors and deviations should be accountedfor. Unless indicated otherwise, parts are parts by weight, molecularweight is weight average molecular weight, temperature is in degreesCelsius, and pressure is at or near atmospheric.

Source of Human Prostate Cell Samples and Isolation of PolynucleotidesExpressed by them

Candidate polynucleotides that may represent genes differentiallyexpressed in cancer were obtained from both publicly-available sourcesand from cDNA libraries generated from selected cell lines and patienttissues. A normalized cDNA library was prepared from one patient tumortissue and cloned polynucleotides for spotting on microarrays wereisolated from the library. Normal and tumor tissues from 13 patientswere processed to generate T7 RNA polymerase transcribedpolynucleotides, which were, in turn, assessed for expression in themicroarrays. The tissues that served as sources for these libraries andpolynucleotides are summarized in Table 4.

Normalization: The objective of normalization is to generate a cDNAlibrary in which all transcripts expressed in a particular cell type ortissue are equally represented [refs. 192 & 193], and thereforeisolation of as few as 30,000 recombinant clones in an optimallynormalized library may represent the entire gene expression repertoireof a cell, estimated to number 10,000 per cell. The source materials forgenerating the normalized prostate libraries were cryopreserved prostatetumor tissue from a patient with Gleason grade 3+3 adenocarcinoma andnormal prostate biopsies from a pool of at-risk subjects under medicalsurveillance. Prostate epithelia were harvested directly from frozensections of tissue by laser capture microdissection (LCM, ArcturusEngineering Inc., Mountain View, Calif.), carried out according tomethods well known in the art (e.g. ref. 194), to provide substantiallyhomogenous cell samples.

Total RNA was extracted from LCM-harvested cells using RNeasy™ ProtectKit (Qiagen, Valencia, Calif.), following manufacturer's recommendedprocedures. RNA was quantified using RiboGreen™ RNA quantification kit(Molecular Probes, Inc. Eugene, Oreg.). One μg of total RNA was reversetranscribed and PCR amplified using SMART™ PCR cDNA synthesis kit(ClonTech, Palo Alto, Calif.). The cDNA products were size-selected byagarose gel electrophoresis using standard procedures (ref. 8). The cDNAwas extracted using Bio 101 Geneclean® II kit (Qbiogene, Carlsbad,Calif.). Normalization of the cDNA was carried out using kinetics ofhybridization principles: 1.0 μg of cDNA was denatured by heat at 100°C. for 10 minutes, then incubated at 42° C. for 42 hours in the presenceof 120 mM NaCl, 10 mM Tris.HCl (pH=8.0), 5 mM EDTA.Na⁺ and 50%formamide. Single-stranded cDNA (“normalized” cDNA) was purified byhydroxyapatite chromatography (#130-0520, BioRad, Hercules, Calif.)following the manufacturer's recommended procedures, amplified andconverted to double-stranded cDNA by three cycles of PCR amplification,and cloned into plasmid vectors using standard procedures (ref 8). Allprimers/adaptors used in the normalization and cloning process areprovided by the manufacturer in the SMART™ PCR cDNA synthesis kit(ClonTech, Palo Alto, Calif.). Supercompetent cells (XL-2 BlueUltracompetent Cells, Stratagene, Calif.) were transfected with thenormalized cDNA libraries, plated on plated on solid media and grownovernight at 36° C.

Characterization of normalized libraries: The sequences of 10,000recombinants per library were analyzed by capillary sequencing using theABI PRISM 3700 DNA Analyzer (Applied Biosystems, California). Todetermine the representation of transcripts in a library, BLAST analysiswas performed on the clone sequences to assign transcript identity toeach isolated clone, i.e. the sequences of the isolated polynucleotideswere first masked to eliminate low complexity sequences using the XBLASTmasking program (refs. 195, 196 and 197). Generally, masking does notinfluence the final search results, except to eliminate sequences ofrelative little interest due to their low complexity, and to eliminatemultiple “hits” based on similarity to repetitive regions common tomultiple sequences e.g. Alu repeats. The remaining sequences were thenused in a BLASTN vs. GenBank search. The sequences were also used asquery sequence in a BLASTX vs. NRP (non-redundant proteins) databasesearch.

Automated sequencing reactions were performed using a Perkin-Elmer PRISMDye Terminator Cycle Sequencing Ready Reaction Kit containing AmpliTaqDNA Polymerase, FS, according to the manufacturer's directions. Thereactions were cycled on a GeneAmp PCR System 9600 as per manufacturer'sinstructions, except that they were annealed at 20° C. or 30° C. for oneminute. Sequencing reactions were ethanol precipitated, pellets wereresuspended in 8 microliters of loading buffer, 1.5 microliters wasloaded on a sequencing gel, and the data was collected by an ABI PRISM3700 DNA Sequencer. (Applied Biosystems, Foster City, Calif.).

The number of times a sequence is represented in a library is determinedby performing sequence identity analysis on cloned cDNA sequences andassigning transcript identity to each isolated clone. First, eachsequence was checked to see if it was a mitochondrial, bacterial orribosomal contaminant. Such sequences were excluded from the subsequentanalysis. Second, sequence artifacts (e.g. vector and repetitiveelements) were masked and/or removed from each sequence.

The remaining sequences were compared via BLAST [198] to GenBank and ESTdatabases for gene identification and were compared with each other viaFastA [199] to calculate the frequency of cDNA appearance in thenormalized cDNA library. The sequences were also searched against theGenBank and GeneSeq nucleotide databases using the BLASTN program(BLASTN 1.3 MP [198]). Fourth, the sequences were analyzed against anon-redundant protein (NRP) database with the BLASTX program (BLASTX 1.3MP [198]). This protein database is a combination of the Swiss-Prot,PIR, and NCBI GenPept protein databases. The BLASTX program was runusing the default BLOSUM-62 substitution matrix with the filterparameter: “xnu+seg”. The score cutoff utilized was 75.

Assembly of overlapping clones into contigs was done using the programSequencher (Gene Codes Corp.; Ann Arbor, Mich.). The assembled contigswere analyzed using the programs in the GCG package (Genetic ComputerGroup, University Research Park, 575 Science Drive, Madison, Wis. 53711)Suite Version 10.1.

Summary of polynucleotides described herein: Table 6 provides a summaryof polynucleotides isolated as described above and identified ascorresponding to a differentially expressed gene (see below).Specifically, Table 6 provides: 1) the HERVK ORF for each clone ID; 2)the clone ID assigned to each sequence; 3) the % patients having theexpression ratio of >/=2X; >/=2-5X; >/=5X; and less than ½ X; and theTumor/Normal mRNA Expression Ratio per patient “Pat”, eg, patient 93,patient 95, patient 96, etc.

Detection of Elevated Levels of cDNA Associated with Prostate CancerUsing Arrays

cDNA sequences representing a variety of candidate genes to be screenedfor differential expression in prostate cancer were assayed byhybridization on-polynucleotide arrays. The cDNA sequences included cDNAclones isolated from cell lines or tissues as described above. The cDNAsequences analyzed also included polynucleotides comprising sequenceoverlap with sequences in the Unigene database, and which encode avariety gene products of various origins, functionality, and levels ofcharacterization. cDNAs were spotted onto reflective slides (Amersham)according to methods well known in the art at a density of 9,216 spotsper slide representing 4608 sequences (including controls) spotted induplicate, with approximately 0.8 μl of an approximately 200 ng/μlsolution of cDNA.

PCR products of selected cDNA clones corresponding to the gene productsof interest were prepared in a 50% DMSO solution. These PCR productswere spotted onto Amersham aluminum microarray slides at a density of9216 clones per array using a Molecular Dynamics Generation III spottingrobot. Clones were spotted in duplicate, for a total of 4608 differentsequences per chip.

cDNA probes were prepared from total RNA obtained by laser capturemicrodissection (LCM, Arcturus Enginering Inc., Mountain View, Calif.)of tumor tissue samples and normal tissue samples isolated from thepatients described above.

Total RNA was first reverse transcribed into cDNA using a primercontaining a T7 RNA polymerase promoter, followed by second strand DNAsynthesis. cDNA was then transcribed in vitro to produce antisense RNAusing the T7 promoter-mediated expression (e.g. ref. 200), and theantisense RNA was then converted into cDNA. The second set of cDNAs wereagain transcribed in vitro, using the T7 promoter, to provide antisenseRNA. This antisense RNA was then fluorescently labeled, or the RNA wasagain converted into cDNA, allowing for third round of T7-mediatedamplification to produce more antisense RNA. Thus the procedure providedfor two or three rounds of in vitro transcription to produce the finalRNA used for fluorescent labeling. Probes were labeled by makingfluorescently labeled cDNA from the RNA starting material.Fluorescently-labeled cDNAs prepared from the tumor RNA sample werecompared to fluorescently labeled cDNAs prepared from normal cell RNAsample. For example, the cDNA probes from the normal cells were labeledwith Cy3 fluorescent dye (green) and cDNA probes prepared from the tumorcells were labeled with Cy5 fluorescent dye (red).

The differential expression assay was performed by mixing equal amountsof probes from tumor cells and normal cells of the same patient. Thearrays were pre-hybridized by incubation for about 2 hrs at 60° C. in5×SSC/0.2% SDS/1 mM EDTA, and then washed three times in water and twicein isopropanol. Following pre-hybridization of the array, the probemixture was then hybridized to the array under conditions of highstringency (overnight at 42° C. in 50% formamide, 5×SSC, and 0.2% SDS.After hybridization, the array was washed at 55° C. three times asfollows: 1) first wash in 1×SSC/0.2% SDS; 2) second wash in 0.1×SSC/0.2%SDS; and 3) third wash in 0.1×SSC.

The arrays were then scanned for green and red fluorescence using aMolecular Dynamics Generation III dual color laser-scanner/detector. Theimages were processed using BioDiscovery Autogene software, and the datafrom each scan set normalized. The experiment was repeated, this timelabeling the two probes with the opposite color in order to perform theassay in both “color directions.” Each experiment was sometimes repeatedwith two more slides (one in each color direction). The data from eachscan was normalized, and the level fluorescence for each sequence on thearray expressed as a ratio of the geometric mean of 8 replicatespots/genes from the four arrays or 4 replicate spots/gene from 2 arraysor some other permutation.

Table 6 summarizes the results for gene products differentiallyexpressed in the prostate tumor samples relative to normal cells. Theratio of differential expression is expressed as the normalizedhybridization signal associated with the tumor probe divided by thenormalized hybridization signal with the normal probe; thus, a ratiogreater than 1 indicates that the gene product is increased inexpression in cancerous cells relative to normal cells, while a ratio ofless than 1 indicates the opposite. The results from each patient areidentified by “Pat” with the corresponding patient identificationnumber. “Concordance” indicates the % of patients in which differentialexpression of the selected gene product in tumor cells was at least atwo-fold different from normal cells.

In at least 79% of prostate patients assayed, 8 out of 10 genes, whoseexpression was elevated by at least 500%, were represented in HERV-K(CH)sequences.

Table 6 provides those gene products that were differentially expressedand were classified as gag, 5′-pol (reverse transcriptase) and 3′-pol(integrase) related sequences. It may be possible to examine thefunction of these gene products in development of cancer and metastasisthrough use of small molecule inhibitors known to affect the activity ofsuch enzymes.

Analysis of the Prostate Cancer Associated Sequences

In order to determine whether there was homology to any known sequences,the PCR products of 16 different clones from one prostate tumor patientwere sequenced. PCR products from these and other clones from the samelibrary were spotted on DNA microarrays. RNA from 13 prostate tumorpatients were assayed on the microarrays and then the full inserts ofsome of the 16 clones were sequenced (Table 6).

The 16 isolates were initially determined in a first pass sequencingreaction to have the sequences as shown in SEQ ID NOS:27-39, inclusive.The isolate from the normal prostate tissue was initially determined ina first pass sequencing reaction to have the sequence as shown in SEQ IDNO:41. A first pass sequencing reaction refers to a high-throughputprocess, where PCR reactions generate the sequencing template thensequencing is performed with one of the PCR primers, in a singledirection. A search of public databases revealed that these 16 isolateshave some degree of identity to regions of the human endogenousretrovirus HERV-K(II) sequence disclosed in Genbank accession numberAB047240 and shown in SEQ ID NO:44, and also to HERV-K(10), but arenonetheless unique.

The isolates were subjected to a second round of nucleic acid sequencingand were found to have the sequences as shown in SEQ ID NOS:14-26,inclusive. The isolate from the normal prostate tissue was subjected toa second round of nucleic acid sequencing and found to have the sequenceas shown in SEQ ID NO:40. This second round of sequencing is acustomized process, where sequencing is performed on purified dsDNAtemplate in a DNA vector. Sequencing is done from both ends of thetemplate, forward and reverse, with primers designed from the flankingregions of the vector, and new primers are synthesized for everyadditional reaction needed to span the entire insert.

The Genbank disclosure of HERV-K(II) provides only an incompletecharacterization of its genetic features and no association with anydisease. The Genbank disclosure characterizes HERV-KII as having a gaggene located at nucleotide 2113-4116 and an env gene located atnucleotide 7437-8174. Detailed analysis of the reported HERV-K(II)sequence indicates that the HERV-K(II) genome includes regions relatedto gag, protease, 5′-end of pol (reverse transcriptase) and 3′-end ofpol (integrase) domains of a retrovirus. Specifically, the location ofthe protease gene is from about nucleotide 3917 to about 4920 and thelocation of the polymerase domain is from about nucleotide 4797 to about7468.

Composite HERV-K(CH) polynucleotide sequences are shown in SEQ ID NOS:7,8, 9 and 10 and FIG. 1 provides a schematic illustration of a humanendogenous retrovirus and the HERV-K(CH) species within the schematicillustration. SEQ ID NO:7 is a composite sequence of the polynucleotidesSEQ ID NOS:14-16, inclusive, and has a consensus sequence as shown inSEQ ID NO:11. This region corresponds to the gag region of a humanendogenous retrovirus. SEQ ID NOS:8 and 9 are composites sequence of thepolynucleotides having a sequence as shown in SEQ ID NOS:17-20,inclusive, and has a consensus sequence as shown in SEQ ID NO:12. Thisregion corresponds to the 5′ pol region of a human endogenousretrovirus. SEQ ID NO:10 is a composite sequence of the polynucleotideshaving a sequence as shown in SEQ ID NOS:21-26, inclusive, and has aconsensus sequence as shown in SEQ ID NO:13. This region corresponds tothe 3′ pol region of a human endogenous retrovirus

Homology to HERV-K(II) gag region varied from 87% to 99%. Homology toHERV-K(II) 5′-pol (reverse transcriptase) region varied from 87% to 97%.Homology to HERV-K(11) 3′-pol (integrase) region was approximately 89%.When compared to the human endogenous provirus HERV-K10, the homology ofthe gag region clones was approximately 79%, the 5′-pol region between81% and 89% and the 3′-pol region was approximately 89%. Table 5illustrates the homology of the sequences of the individual clones withthe corresponding HERV-K(II) and HERV-K(10) regions. Because thepresence of polyA stretches in the HERV-K(CH) sequences (and depositedisolates) may be an artifact of cloning, the % identity shown in Table 5was determined with alignments performed with polynucleotides excludingthe terminal polyA stretch.

Consensus polynucleotide sequences SEQ ID NOS:11-13 were generated withMultiple Sequence Alignment (MSA), a web implementation of the GCGPileup and Pretty programs. The program uses a clustering algorithmsimilar to the Clustal program described in reference. The defaultvalues for the alignments and consensus extraction were 8 for gap openand 2 for gap extension. The poling plurality or minimum number of likesequences specified to assign a residue to the consensus sequence was 2.

The polynucleotide sequences shown in SEQ ID NOS:14-16, inclusive, wereused for the consensus polynucleotide sequence shown in SEQ ID NO:11.The polynucleotide sequences shown in SEQ ID NOS:17-20, inclusive, wereused for the consensus polynucleotide sequence shown in SEQ ID NO:12.The polynucleotide sequences shown in SEQ ID NOS:21-26, inclusive, wereused for the consensus polynucleotide shown in SEQ ID NO:13. The “N”represents where there is no qualifying minimum representative base.i.e. at least two sequences with the same base at that site.

Northern blotting of prostate cancer cell lines using nucleotides243-end of SEQ ID NO:150 labeled as a probe indicates that they expressPCAV transcripts of several sizes, corresponding to both full-lengthviral genomic sequences and to sub-genomic spliced transcripts (FIG. 5).Expression of such transcripts have also been observed interatocarcinoma cell lines [15], as shown in lane 1 of FIG. 14.

Investigation of Other Human Endogenous Retroviruses

HERV-K(CH) is a member of the HML-2 subgroup of the HERV-K family.HERV-K(II) and HERV-K(10) are also members of this sub-group.

The same microarray techniques as described above were used to study theexpression of members of the HERV-K family in the HML-2 and HML-6subgroups in prostate tumor tissue. The expression of HERV-H viruses wasalso studied.

The results in table 9 show that HERV-His not up-regulated in prostatetumors. The HML-6 subgroup of HERV-K is also not up-regulated. The onlyendogenous retroviruses that are up-regulated in prostate tumors are inthe HML-2 subgroup.

Investigation of Tumors Other than Prostate Tumors

HML-2 endogenous retroviruses are up-regulated in prostate tumors. Tumorsamples taken from patients with breast and colon cancer wereinvestigated for up-regulation of HML-2 and HML-6 HERV-K viruses usingthe microarray techniques described above.

The results in table 10 show that the HML-2 viruses are up-regulated intissue from prostate tumors, but not from colon or breast tumors. HML-6expression is not up-regulated in any of the tumors.

Detection of HERV-K(CH) Sequences in Human Prostate Cancer Cells andTissues.

DNA from prostate cancer tissue and other human cancer tissues, humancolon, normal human tissues including non-cancerous prostate, and fromother human cell lines are extracted following the procedure of ref.202. The DNA is re-suspended in a solution containing 0.05 M Tris HClbuffer, pH 7.8, and 0.1 mM EDTA, and the amount of DNA recovered isdetermined by microfluorometry using Hoechst 33258 dye [ref. 203].

Polymerase chain reaction (PCR) is performed using Taq polymerasefollowing the conditions recommended by the manufacturer (Perkin ElmerCetus) with regard to buffer, Mg²⁺, and nucleotide concentrations.Thermocycling is performed in a DNA cycler by denaturation at 94° C. for3 min. followed by either 35 or 50 cycles of 94° C. for 1.5 min., 50° C.for 2 min. and 72° C. for 3 min. The ability of the PCR to amplify theselected regions of the HERV-K(CH) gene is tested by using a clonedHERV-K(CH) polynucleotide(s) as a positive template(s). Optimal Mg²⁺,primer concentrations and requirements for the different cyclingtemperatures are determined with these templates. The master mixrecommended by the manufacturer is used. To detect possiblecontamination of the master mix components, reactions without templateare routinely tested.

Southern blotting and hybridization are performed as described inreference 204, using the cloned sequences labeled by the random primerprocedure [205]. Prehybridization and hybridization are performed in asolution containing 6×SSPE, 5% Denhardt's, 0.5% SDS, 50% formamide, 100μg/ml denaturated salmon testis DNA, incubated for 18 hrs at 42° C.,followed by washings with 2×SSC and 0.5% SDS at room temperature and at37° C. and finally in 0.1×SSC with 0.5% SDS at 68° C. for 30 min (ref.8). For paraffin-embedded tissue sections the conditions described inref. 206 are followed using primers designed to detect a 250 bysequence.

Expression of Cloned Polynucleotides in Host Cells.

To study the polypeptide products of HERV-K(CH) cDNA, restrictionfragments from the HERV-K(CH) cDNA are cloned into the expression vectorpMT2 (pages 16.17-16.22 of ref. 8) and transfected into COS cells grownin DMEM supplemented with 10% FCS. Transfections are performed employingcalcium phosphate techniques (pages 16.32-16.40 of ref. 8) and celllysates are prepared forty-eight hours after transfection from bothtransfected and untransfected COS cells. Lysates are subjected toanalysis by immunoblotting using anti-peptide antibody.

In immunoblotting experiments, preparation of cell lysates andelectrophoresis are performed according to standard procedures. Proteinconcentration is determined using BioRad protein assay solutions. Aftersemi-dry electrophoretic transfer to nitro-cellulose, the membranes areblocked in 500 mM NaCl, 20 mM Tris, pH 7.5, 0.05% Tween-20 (TTBS) with5% dry milk. After washing in TTBS and incubation with secondaryantibodies (Amersham), enhanced chemiluminescence (ECL) protocols(Amersham) are performed as described by the manufacturer to facilitatedetection.

Generation of Antibodies Against Polypeptides.

Polypeptides, unique to HERV-K(CH) are synthesized or isolated frombacterial or other (e.g. yeast, baculovirus) expression systems andconjugated to rabbit serum albumin (RSA) with m-maleimido benzoic acidN-hydroxysuccinimide ester (MBS) (Pierce, Rockford, Ill.). Immunizationprotocols with these peptides are performed according to standardmethods. Initially, a pre-bleed of the rabbits is performed prior toimmunization. The first immunization includes Freund's complete adjuvantand 500 μg conjugated peptide or 100 μg purified peptide. All subsequentimmunizations, performed four weeks after the previous injection,include Freund's incomplete adjuvant with the same amount of protein.Bleeds are conducted seven to ten days after the immunizations.

For affinity purification of the antibodies, the correspondingHERV-K(CH) polypeptide is conjugated to RSA with MBS, and coupled toCNBr-activated Sepharose (Pharmacia, Sweden). Antiserum is diluted10-fold in 10 mM Tris-HCl, pH 7.5, and incubated overnight with theaffinity matrix. After washing, bound antibodies are eluted from theresin with 100 mM glycine, pH 2.5.

ELISA Assay for Detecting HERV-K(CH) Gag and/or Pol Related Sequences.

To test blood samples for antibodies that bind specifically torecombinantly produced HERV-K(CH) antigens, the following procedure isemployed. After the recombinant HERV-K(CH) pol or gag or env relatedpolypeptides are purified, the recombinant polypeptide is diluted in PBSto a concentration of 5 μg/ml (500 ng/100 μl). 100 microliters of thediluted antigen solution is added to each well of a 96-well Immulon 1plate (Dynatech Laboratories, Chantilly, Va.), and the plate is thenincubated for 1 hour at room temperature, or overnight at 4° C., andwashed 3 times with 0.05% Tween 20 in PBS. Blocking to reducenonspecific binding of antibodies is accomplished by adding to each well200 μl of a 1% solution of bovine serum albumin in PBS/Tween 20 andincubation for 1 hour. After aspiration of the blocking solution, 100 μlof the primary antibody solution (anticoagulated whole blood, plasma, orserum), diluted in the range of 1/16 to 1/2048 in blocking solution, isadded and incubated for 1 hour at room temperature or overnight at 4° C.The wells are then washed 3 times, and 100 μl goat anti-human IgGantibody conjugated to horseradish peroxidase (organon Teknika, Durham,N.C.), diluted 1/500 or 1/1000 in PBS/Tween 20, 100 μl ofo-phenylenediamine dihydrochloride (OPD, Sigma) solution is added toeach well and incubated for 5-15 minutes. The OPD solution is preparedby dissolving a 5 mg OPD tablet in 50 ml 1% methanol in H₂O and adding50 μl 30% H₂O₂ immediately before use. The reaction is stopped by adding25 l of 4M H₂SO₄ Absorbance are read at 490 nm in a microplate reader(Bio-Rad).

Preparation of Vaccines.

The present invention also relates to a method of stimulating an immuneresponse against cells that express HERV-K(CH) polypeptides in a patientusing HERV-K(CH) gag, and/or pol polypeptides of the invention that actsas an antigen produced by or associated with a malignant cell. Thisaspect of the invention provides a method of stimulating an immuneresponse in a human against prostate cells or cells that express aHERV-K(CH) pol or gag polynucleotides and polypeptides. The methodcomprises the step of administering to a human an immunogenic amount ofa polypeptide comprising: (a) the amino acid sequence of a humanendogenous retrovirus HERV-K(CH) polypeptide or (b) a mutein or variantof a polypeptide comprising the amino acid sequence of a humanendogenous retrovirus HERV-K(CH) polypeptide.

Generation of Transgenic Animals Expressing Polypeptides as a Means forTesting Therapeutics.

HERV-K(CH) nucleic acids are used to generate genetically modifiednon-human animals, or site specific gene modifications thereof, in celllines, for the study of function or regulation of prostate tumor-relatedgenes, or to create animal models of diseases, including prostatecancer. The term “transgenic” is intended to encompass geneticallymodified animals having an exogenous HERV-K(CH) gene(s) that is stablytransmitted in the host cells where the gene(s) may be altered insequence to produce a modified polypeptide, or having an exogenousHERV-K(CH) LTR promoter operably linked to a reporter gene. Transgenicanimals may be made through a nucleic acid construct randomly integratedinto the genome. Vectors for stable integration include plasmids,retroviruses and other animal viruses, YACs, and the like. Of interestare transgenic mammals, e.g. cows, pigs, goats, horses, etc., andparticularly rodents, e.g. rats, mice, etc.

The modified cells or animals are useful in the study of HERV-K(CH) genefunction and regulation. For example, a series of small deletions and/orsubstitutions may be made in the HERV-K(CH) gene to determine the roleof different domains in prostate tumorigenesis. Specific constructs ofinterest include, but are not limited to, anti-sense constructs to blockHERV-K(CH) gene expression, expression of dominant negative HERV-K(CH)gene mutations, and over-expression of a HERV-K(CH) gene. Expression ofa HERV-K(CH) gene or variants thereof in cells or tissues where it isnot normally expressed or at abnormal times of development is provided.In addition, by providing expression of polypeptides derived fromHERV-K(CH) in cells in which it is otherwise not normally produced,changes in cellular behavior can be induced.

DNA constructs for random integration need not include regions ofhomology to mediate recombination. Conveniently, markers for positiveand negative selection are included. For various techniques fortransfecting mammalian cells, see ref. 207.

For embryonic stem (ES) cells, an ES cell line is employed, or embryoniccells is obtained freshly from a host, e.g. mouse, rat, guinea pig, etc.Such cells are grown on an appropriate fibroblast-feeder layer or grownin the presence of appropriate growth factors, such as leukemiainhibiting factor (LIF). When ES cells are transformed, they may be usedto produce transgenic animals. After transformation, the cells areplated onto a feeder layer in an appropriate medium. Cells containingthe construct may be detected by employing a selective medium. Aftersufficient time for colonies to grow, they are picked and analyzed forthe occurrence of integration of the construct. Those colonies that arepositive may then be used for embryo manipulation and blastocystinjection. Blastocysts are obtained from 4 to 6 week old superovulatedfemales. The ES cells are trypsinized, and the modified cells areinjected into the blastocoel of the blastocyst. After injection, theblastocysts are returned to each uterine horn of pseudopregnant females.Females are then allowed to go to term and the resulting chimericanimals screened for cells bearing the construct. By providing for adifferent phenotype of the blastocyst and the ES cells, chimeric progenycan be readily detected.

The chimeric animals are screened for the presence of the modified geneand males and females having the modification are mated to producehomozygous progeny. If the gene alterations cause lethality at somepoint in development, tissues or organs are maintained as allogeneic orcongenic grafts or transplants, or in in vitro culture. The transgenicanimals may be any non-human mammal, such as laboratory animals,domestic animals, etc. The transgenic animals are used in functionalstudies, drug screening, etc., e.g. to determine the effect of acandidate drug on prostate cancer, to test potential therapeutics ortreatment regimens, etc.

Diagnostic Imaging Using HERV-K(CH) Specific Antibodies

The present invention encompasses the use of antibodies to HERV-K(CH)polypeptides to accurately stage prostate cancer patients at initialpresentation and for early detection of metastatic spread of prostatecancer. Radioimmunoscintigraphy using monoclonal antibodies specific forHERV-K(CH) gag or HERV-K(CH) pol or portions thereof or other HERV-K(CH)polypeptides can provide an additional tumor-specific diagnostic test.The monoclonal antibodies of the instant invention are used forhistopathological diagnosis of prostate carcinomas.

Subcutaneous human xenografts of prostate cancer cells in nude mice isused to test whether a technetium-99m (^(99m)Tc)-labeled monoclonalantibody of the invention can successfully image the xenograftedprostate cancer by external gamma scintography as described for seminomacells in ref. 208. Each monoclonal antibody specific for a HERV-K(CH)polypeptide is purified from ascitic fluid of BALB/c mice bearinghybridoma tumors by affinity chromatography on polypeptide A-Sepharose.Purified antibodies, including control monoclonal antibodies such as anavidin-specific monoclonal antibody [209] are labeled with ^(99m)Tcfollowing reduction, using the methods of refs. 210 and 211. Nude micebearing human prostate cancer cells are injected intraperitoneally with200-500 μCi of ^(99m)Tc-labeled antibody. Twenty-four hours afterinjection, images of the mice are obtained using a Siemens ZLC3700 gammacamera equipped with a 6 mm pinhole collimator set approximately 8 cmfrom the animal. To determine monoclonal antibody biodistributionfollowing imaging, the normal organs and tumors are removed, weighed,and the radioactivity of the tissues and a sample of the injectate aremeasured. Additionally, HERV-K(CH)-specific antibodies conjugated toantitumor compounds are used as prostate cancer-specific chemotherapy.

Deposits

The materials listed in Table 7 were deposited with the American TypeCulture Collection.

All publications and patent applications mentioned in this specificationare incorporated herein by reference to the same extent as if eachindividual publication or patent application were specifically andindividually indicated to be incorporated by reference.

The foregoing description of preferred embodiments of the invention hasbeen presented by way of illustration and example for purposes ofclarity and understanding. It is not intended to be exhaustive or tolimit the invention to the precise forms disclosed. It will be readilyapparent to those of ordinary skill in the art in light of the teachingsof this invention that many changes and modifications may be madethereto without departing from the spirit of the invention. It isintended that the scope of the invention be defined by the appendedclaims and their equivalents.

TABLE 1 GAG protease (5′) probes, isolate specific Isolate NucleotidesSEQ ID K(CH) 1224-1238 161 KII 2098-2114 162 K10 874-890 163 894-908 164910-927 165 927-944 166  989-1004 167 1019-1036 168 1046-1063 1691063-1078 170 1084-1103 171 1131-1145 172 1148-1163 173 1164-1185 1741206-1223 175 1216-1235 176 1243-1260 177 1258-2375 178 1277-1295 1791300-1329 180 1347-1361 181 1367-1382 182 1392-1410 183 1412-1428 1841426-1442 185 1445-1461 186 1463-1477 187 K10 1490-1510 188 1502-1520189 1522-1538 190 1561-1576 191 1586-1605 192 1620-1635 193 1653-1669194 1698-1723 195 1722-1743 196 1748-1762 197 1773-1788 198 1820-1834199 1872-1887 200 1917-1935 201 1940-1955 202 1955-1969 203 1973-1995204 2008-2042 205 2049-2064 206 2076-2093 207 2097-2113 208 2122-2139209 2148-2118 210 2176-2196 211 2198-2212 212 2219-2235 213 2246-2261214

TABLE 2 Protease (3′seq) Polymerase (5′seq) Probes Isolate NucleotidesSEQ ID K(CH) 170-188 215 consensus 205-221 216 253-268 217 316-336 218401-417 219 490-504 220 538-552 221 872-886 222 K(CH) 109-125 2231374-1388 224 1402-1416 225 KII 140-159 110 410-426 111 1127-1141 112K10 11-38 113 37-54 114 70-90 115 226-243 116 249-264 117 308-324 118327-342 119 381-397 120 440-454 121 541-557 122 678-698 123 722-741 124753-767 125 771-785 126 854-869 127 872-890 128 1195-1209 129 1308-1323130 1335-1349 131 1349-1365 132

TABLE 3 3′ POL probes only Isolate Nucleotides SEQ ID K(CH) consensus 3-17 133 25-39 134  82-104 135 136-151 136 154-169 137 189-203 138322-337 139 461-475 140 630-645 141 712-727 142 757-771 143 818-833 144KII 1636-1651 145

TABLE 4 ORFS and sources of initial isolates/clones from prostate cDNAlibraries HERVK ORF Chiron Clone ID Source of Clone gag 035JN002.E02Prostate Cancer Tissue, Patient 101, Gleason Grade 3 + 3 gag035JN013.H09 Prostate Cancer Tissue, Patient 101, Gleason Grade 3 + 3gag 035JN023.F12 Prostate Cancer Tissue, Patient 101, Gleason Grade 3 +3 gag 037XN001.D10 Normal Prostate Tissue, Pooled from 10 individualspol5′ 035JN001.F06 Prostate Cancer Tissue, Patient 101, Gleason Grade3 + 3 pol5′ 035JN003.E06 Prostate Cancer Tissue, Patient 101, GleasonGrade 3 + 3 pol5′ 035JN013.C11 Prostate Cancer Tissue, Patient 101,Gleason Grade 3 + 3 pol5′ 035JN013.F03 Prostate Cancer Tissue, Patient101, Gleason Grade 3 + 3 pol3′ 035JN003.G09 Prostate Cancer Tissue,Patient 101, Gleason Grade 3 + 3 pol3′ 035JN010.A09 Prostate CancerTissue, Patient 101, Gleason Grade 3 + 3 pol3′ 035JN015.F06 ProstateCancer Tissue, Patient 101, Gleason Grade 3 + 3 pol3′ 035JN020.B12Prostate Cancer Tissue, Patient 101, Gleason Grade 3 + 3 pol3′035JN020.D07 Prostate Cancer Tissue, Patient 101, Gleason Grade 3 + 3pol3′ 035JN022.G09 Prostate Cancer Tissue, Patient 101, Gleason Grade3 + 3 pol3′ 035JN015.H02 Prostate Cancer Tissue, Patient 101, GleasonGrade 3 + 3 pol3′ 035JN016.H02 Prostate Cancer Tissue, Patient 101,Gleason Grade 3 + 3

TABLE 5 Identity of HERV-K(CH) polynucleotides with HERV-K(II) andHERV-K(10) % Identity % Identity Clone ID Region HERV-K(II) HERV-K(10)035JN003.G09 3′-pol 89.423 89.423 035JN010.A09 3′-pol 89.663 89.663035JN015.F06 3′-pol 89.423 89.423 035JN020.B12 3′-pol 89.303 89.303035JN020.D07 3′-pol 89.614 89.614 035JN022.G09 3′-pol 89.354 89.354035JN002.E02 gag 99.524 79.881 035JN013.H09 gag 99.017 79.975035JN023.F12 gag 98.849 79.335 035XN001.D10 gag 87.383 79.947035JN001.F06 5′-pol 97.211 88.844 035JN003.E06 5′-pol 97.450 86.723035JN013.C11 5′-pol 97.156 85.444 035JN013.F03 5′-pol 87.962 81.521

TABLE 6 DNA microarray results: 13 patients tumor vs. normal prostate,expression of HERV-K RNA Turmor/Normal mRNA Percent Patient withExpression Ratio Expression Ratio HERVK ORF Chiron Clone ID >= 2x >=2-5x >= 5x <= halfx Pat 93 Pat 95 gag 035JN002.E02 57.1 42.9 7.1 0.0 4.83.0 gag 035JN013.H09 78.6 78.6 50.0 0.0 9.3 4.5 gag 035JN023.F12 78.678.6 57.1 0.0 9.1 4.1 gag 037XN001.D10 64.3 64.3 14.3 0.0 5.4 3.4pol5prime 035JN001.F06 42.9 21.4 7.1 0.0 2.0 2.6 pol5prime 035JN003.E0642.9 21.4 7.1 0.0 2.1 2.6 pol5prime 035JN013.C11 85.7 78.6 57.1 0.0 6.95.6 pol5prime 035JN013.F03 85.7 71.4 21.4 0.0 4.6 3.4 pol3prime035JN003.G09 71.4 57.1 7.1 0.0 4.1 3.3 pol3prime 035JN010.A09 85.7 78.671.4 0.0 8.0 4.4 pol3prime 035JN015.F06 85.7 78.6 71.4 0.0 7.6 4.0pol3prime 035JN020.B12 85.7 78.6 64.3 0.0 7.0 4.0 pol3prime 035JN020.D0785.7 78.6 57.1 0.0 6.0 3.2 pol3prime 035JN022.G09 78.6 78.6 57.1 0.0 6.64.2 pol3prime 035JN015.H02 85.7 78.6 57.1 0.0 7.9 4.2 pol3prime035JN016.H02 71.4 71.4 14.3 0.0 3.8 3.0 Turmor/Normal mRNA ExpressionRatio HERVK ORF Chiron Clone ID Pat 96 Pat 97 Pat 151 Pat 155 Pat 231Pat 232 gag 035JN002.E02 2.1 1.0 2.3 2.5 1.9 1.7 gag 035JN013.H09 5.21.4 5.5 13.8 4.2 3.5 gag 035JN023.F12 5.1 1.6 5.5 17.0 4.5 3.2 gag037XN001.D10 2.5 1.5 3.6 4.6 2.9 1.8 pol5prime 035JN001.F06 1.8 1.5 2.71.8 2.0 1.8 pol5prime 035JN003.E06 1.8 1.4 2.6 1.9 2.0 1.7 pol5prime035JN013.C11 6.9 2.0 7.4 24.0 4.8 4.3 pol5prime 035JN013.F03 3.7 2.2 4.68.4 4.1 3.4 pol3prime 035JN003.G09 3.3 1.6 4.9 3.3 2.2 3.5 pol3prime035JN010.A09 12.6 2.1 12.4 55.9 5.1 9.5 pol3prime 035JN015.F06 12.8 2.211.9 53.4 5.1 8.0 pol3prime 035JN020.B12 10.5 2.2 11.9 34.9 5.0 6.8pol3prime 035JN020.D07 8.7 2.0 13.7 22.9 4.6 8.6 pol3prime 035JN022.G096.6 2.0 8.8 12.7 4.5 5.3 pol3prime 035JN015.H02 9.0 2.1 10.7 35.3 4.77.5 pol3prime 035JN016.H02 3.4 1.9 4.3 5.0 3.0 3.1 Turmor/Normal mRNAExpression Ratio HERVK ORF Chiron Clone ID Pat 251 Pat 282 Pat 286 Pat294 Pat 351 gag 035JN002.E02 6.9 1.5 0.6 2.6 2.9 gag 035JN013.H09 31.24.5 1.0 12.1 8.6 gag 035JN023.F12 28.2 5.2 1.0 12.7 7.3 gag 037XN001.D1010.0 1.7 1.0 3.5 4.3 pol5prime 035JN001.F06 7.8 1.2 1.0 1.9 2.3pol5prime 035JN003.E06 7.7 1.2 1.0 1.8 2.1 pol5prime 035JN013.C11 37.44.4 1.0 13.1 8.8 pol5prime 035JN013.F03 21.8 2.3 1.0 5.0 5.8 pol3prime035JN003.G09 14.9 1.5 1.0 2.5 3.9 pol3prime 035JN010.A09 70.0 5.8 1.026.3 9.7 pol3prime 035JN015.F06 69.7 5.9 1.0 25.3 9.1 pol3prime035JN020.B12 44.5 5.2 1.0 15.2 8.1 pol3prime 035JN020.D07 58.2 3.8 1.015.8 7.6 pol3prime 035JN022.G09 28.0 2.6 1.0 5.9 7.8 pol3prime035JN015.H02 49.5 4.8 1.0 18.2 8.7 pol3prime 035JN016.H02 14.1 1.7 1.02.6 5.0

TABLE 7 DEPOSITS Cell Line CMCC Accession No. ATCC Accession No.035JN003G09 5400 PTA 2561 035JN010A09 5401 PTA 2572 035JN015F06 5402 PTA2566 035JN015H02 5403 PTA 2571 035JN020B12 5405 PTA 2562 035JN020D075406 PTA 2573 035JN022G09 5413 PTA 2560 035JN002E02 5404 PTA 2565035JN013H09 5408 PTA 2568 035JN023F12 5409 PTA 2564 035XN001D10 5410 PTA2569 035JN001F06 5411 PTA 2567 035JN003E06 5412 PTA 2559 035JN013C115407 PTA 2563 035JN013F03 5415 PTA 2570 ATCC = American Type CultureCollection CMCC = Chiron Master Culture Collection All deposits made10th Apr. 2000

TABLE 8 Sequence listing SEQ ID DESCRIPTION 1 U5 region ofherv-k(hml-2.hom) [GenBank AF074086] 2 U3 region of herv-k(hml-2.hom) 3R region of herv-k(hml-2.hom) 4 RU5 region of herv-k(hml-2.hom) 5 U3Rregion of herv-k(hml-2.hom) 6 Non-coding region between U5 and first 5′splice site of herv-k(hml-2.hom) 7 Composite of three HERV-K(CH)polynucleotides [SEQ IDs 14-16] positioned in the gag region. 8 & 9Composite of four HERV-K(CH) polynucleotides [SEQ IDs 17-20] positionedin the 5′ pol region 10 Composite of six HERV-K(CH) polynucleotides [SEQIDs 21-26] positioned in the 3′ pol region 11 Consensus sequence ofHERV-K(CH) gag region 12 Consensus sequence of HERV-K(CH) 5′ pol region13 Consensus sequence of HERV-K(CH) 3′ pol region 14 Sequence for clone035JN002.E02. 15 Sequence for clone 035JN023.F12. 16 Sequence for clone035JN013.H09. 17 Sequence for clone 035JN013.C11 18 Sequence for clone035JN003.E06. 19 Sequence for clone 35JN001.F06. 20 Sequence for clone035JN013.F03. 21 Sequence for clone 035JN020.D07. 22 Sequence for clone035JN015.F06. 23 Sequence for clone 035JN003.G09. 24 Sequence for clone035JN020.B12. 25 Sequence for clone 035JN022.G09. 26 Sequence for clone035JN010.A09. 27 Sequence for clone 035JN002.E02. 28 Sequence for clone035JN023.F12. 29 Sequence for clone 035JN013.H09. 30 Sequence for clone035JN013.C11. 31 Sequence for clone 035JN003.E06. 32 Sequence for clone035JN001.F06. 33 Sequence for clone 035JN013.F03. 34 Sequence for clone035JN020.D07. 35 Sequence for clone 035JN015.F06. 36 Sequence for clone035JN003.G09. 37 Sequence for clone 035JN020.B12. 38 Sequence for clone035JN022.G09. 39 Sequence for clone 035JN010.A09. 40 Sequence for clone037XN001.D10 and isolated from normal prostate tissue. 41 Sequence forclone 037XN001.D10 and isolated from normal prostate tissue. 42 ESTpolynucleotide sequence shown in GenBank accession number Q60732. 43 ESTpolynucleotide sequence SEQ ID 407 of WO 00/04149 44 Polynucleotidesequence for HERV-KII 45 Polynucleotide sequence for HERV-K10 46-49Amino acid translations of SEQ IDs 11, 14, 15, 16 50-55 Amino acidtranslations of SEQ IDs 21-26 (note PSFGK motifs) 56-57 Amino acidtranslations of SEQ IDs 27 & 28 58 Consensus polypeptide sequenceinferred from SEQ IDs 21-26 59-82 Polynucleotide probes not in SEQ IDs42-45 83 & 84 Polynucleotide probes shared with SEQ IDs 42-45 85HERV-K108 gag CDS 86 HERV-K108 prt CDS 87 HERV-K108 pol CDS 88 HERV-K108env CDS 89 HERV-K108 cORF 5′ CDS 90 HERV-K108 cORF 3′ CDS 91 HERV-K(C7)gag CDS 92 HERV-K(C7) gag amino acid sequence 93 HERV-K(C7) pol CDS 94HERV-K(C7) pol amino acid sequence 95 HERV-K(C7) env CDS 96 HERV-K(C7)env amino acid sequence 97 HERV-K(II) gag CDS 98 HERV-K(II) gag aminoacid sequence 99 HERV-K(II) prt CDS 100 HERV-K(II) pol CDS 101HERV-K(II) env CDS 102 HERV-K10 gag CDS 103 HERV-K10 gag(i) 104 HERV-K10gag(ii) 105 HERV-K10 prt CDS 106 HERV-K10 prt amino acid sequence 107HERV-K10 pol/env CDS 108 HERV-K10 pol/env amino acid sequence 109 cORFamino acid sequence 110-132 Table 2 probes (cont^(d) at SEQ IDs 215-225)133-145 Table 3 probes 146 HML-2.HOM (‘ERVK6’) gag amino acid sequence147 HML-2.HOM (‘ERVK6’) prt amino acid sequence 148 HML-2.HOM (‘ERVK6’)pol amino acid sequence 149 HML-2.HOM (‘ERVK6’) env amino acid sequence150 LTR of herv-k(hml-2.hom) 151-154 HML-2 LTR sequences 155 & 156herv-k(hml-2.hom) RU5 region (5′ and 3′ regions, respectively) 157 Envconsensus nucleic acid sequence (FIG. 6) 158 Gag consensus sequence(FIG. 7) 159 Pol consensus sequence (FIG. 8) 160 Env consensus sequence(FIG. 9) 161-214 Table 1 probes 215-225 Table 2 probes (cont^(d) fromSEQ IDs 110-132)

TABLE 9 Expression of HERV-H and HERV-K in prostate tumors GenBank IDHERV HML Subgroup Result AB047240 K HML-2 65 AP164611 K HML-2 63AF164612 K HML-2 63 AF079797 K HML-6 3 BC005351 H — 0 XM_054932 H — 0The “Result” column gives the % of patient samples which showedup-regulation of the GenBank sequence given in the first column in tumortissue relative to non-tumor tissue.

TABLE 10 Expression of HERV-K viruses in colon and breast tumors ResultGenBank ID HERV HML Subgroup Prostate Breast Colon AB047240 K HML-2 65 02 AF079797 K HML-6 3 6 0 AF164611 K HML-2 63 0 2 AF164612 K HML-2 63 6 2The “Result” columns give the % of patient samples which showedup-regulation of the GenBank sequence given in the first column in tumortissue relative to non-tumor tissue.

TABLE 11 HML-2 subgroup of HERV-K Family Query Target Percent PercentQuery Length Target Locus Target Description Length Score Pscore MatchesSimilarities Alignment Query N4 7428 NT_022283S1.2 /contig_orient =102399 72570 3.9E−47 7334 7334 98 98 none/start = 1/ end = 160119/ chrom= 2 Homo N4 7428 NT_007386S1.3 /contig_orient = 102399 72570 3.9E−477334 7334 98 98 complement/start = 1/end = 250001/ chrom = 6 N4 7428NT_009509S2.3 /contig_orient = 102399 72379 5.3E−47 7329 7329 98 98forward/start = 250002/end = 500002/ chrom = 12 N4 7428 NT_009151S32.3/contig_orient = 102399 72707 3.1E−47 7345 7345 98 98 complement/start =7623180/end = 7873180 N4 7428 NT_023901S1.2 /contig_orient = 10239970366 1.3E−45 7222 7222 97 97 none/start = 1/ end = 166310/ chrom = 8 N47428 NT_025820S3.2 /contig_orient = 102399 67986 5.9E−44 7112 7112 95 95complement/statt = 4556361/end = 661270 N4 7428 NT_024249S1.2/contig_orient = 102399 67986 5.9E−44 7112 7112 95 95 none/start = 1/end = 167403/ chrom = 11 Homo N4 7428 NT_011519S9.5 /contig_orient =102399 68342 3.4E−44 7058 7058 95 95 forward/start = 2016320/end =2266320 N4 7428 NT_006788S1.3 /contig_orient = 102399 68610 2.6E−44 70667066 95 95 complement/start = 1/end = 250000/ chrom = 5 N4 7428NT_004858S5.3 /contig_orient = 102399 68624 2.1E−44 7072 7072 95 95complement/start = 999278/end = 1248551 N4 7428 NT_005795S3.3/contig_orient = 102399 67968 6.1E−44 7040 7040 94 94 forward/start =405779/end = 655779/ chrom = 3 N4 7428 NT_025140S3.3 /contig_orient =97618 68168 4.4E−44 7049 7049 94 94 none/start = 449919/ end =649836/chrom = 19 N4 7428 NT_009334S8.3 /contig_orient = 102399 654473.4E−42 6913 6913 93 93 complement/start = 1574760/end = 1824759 N4 7428NT_004406S4.3 /contig_orient = 85887 65099  6E−42 6910 6910 92 93forward/start = 797371/end = 985557/ chrom = 1 N4 7428 NT_011192S4.3/contig_orient = 102399 62351 4.9E−40 6844 6844 90 92 forward/start =750004/end = 949429/ chrom = 19 N4 7428 NT_007592S14.3 /contig_orient =102399 56493 5.7E−36 6795 6795 79 91 forward/start = 3276099/end =3526099/ chrom = 6 N4 7428 NT_011512S23.3 /contig_orient = 102399 64096 3E−41 6818 6818 92 91 forward/start = 5505084/end = 5755084 N4 7428NT_019638S1.3 /contig_orient = 102399 57114 2.1E−36 6273 6273 82 90none/start = 1/ end = 250001/ chrom = 19 N4 7428 NT_022411S1.3/contig_orient = 61348 65630 2.6E−42 6734 6734 95 90 none/start = 1/ end= 163648/ chrom = 3 N4 7428 NT_005632S1.3 /contig_orient = 102399 627392.6E−40 6630 6630 91 89 complement/start = 1/end = 214350/ chrom = 3 N47428 NT_022504S1.3 /contig_orient = 102399 56001 1.3E−35 6420 6420 86 86forward/start = 1/end = 271641/ chrom = 3 N4 7428 NT_023397S2.3/contig_orient = 102399 49492 4.1E−3 6275 6275 79 84 complement/start =250002/end = 455242 N4 7428 NT_011520S13.5 /contig_orient = 102399 475309.6E−30 6166 6166 77 83 forward/start = 3068083/end = 3318083 N4 742NT_019483S4.3 /contig_orient = 102399 50179 1.4E−3 6184 6184 82 83forward/start = 750003/end = 1000003/ chrom = 8 N4 742 NT_019483S2.3/contig_orient = 102399 50122 1.5E−31 6177 6177 82 83 forward/start =250001/end = 500001/ chrom = 8 N4 742 NT_024033S5.3 /contig_orient =102399 57370 1.4E−36 5859 5859 97 78 forward/start = 1000005/end =1250005 N4 742 NT_023628S1.3 /contig_orient = 102399 56440 6.2E−36 5651565 99 76 complement/start1/end = 151365/chrom = 7 N4 742 NT_023323S1.2/contig_orient = 102399 45124 4.5E−28 5600 5600 82 75 none/start = 1/end = 103061/ chrom = 5 Homo Query Query Query Target Target Open GapExtension Query Length Target Locus Target Description Start End StartEnd Penalty Penalty N4 7428 NT_022283S1.2 /contig_orient = 7428 1 1350620899 −20 −5 none/start = 1/ end = 160119/ chrom = 2 Homo N4 7428NT_007386S1.3 /contig_orient = 1 7428 29800 37193 −20 −5complement/start = 1/end = 250001/ chrom = 6 N4 7428 NT_009509S2.3/contig_orient = 7428 1 136539 143951 −20 −5 forward/start = 250002/end= 500002/ chrom = 12 N4 7428 NT_009151S32.3 /contig_orient = 1 7428114716 122137 −20 −5 complement/start = 7623180/end = 7873180 N4 7428NT_023901S1.2 /contig_orient = 7428 1 94194 101616 −20 −5 none/start =1/ end = 166310/ chrom = 8 N4 7428 NT_025820S3.2 /contig_orient = 1 7428164603 172033 −20 −5 complement/statt = 4556361/end = 661270 N4 7428NT_024249S1.2 /contig_orient = 1 7428 18873 26303 −20 −5 none/start = 1/end = 167403/ chrom = 11 Homo N4 7428 NT_011519S9.5 /contig_orient = 17428 62776 69910 −20 −5 forward/start = 2016320/end = 2266320 N4 7428NT_006788S1.3 /contig_orient = 1 7428 144115 151250 −20 −5complement/start = 1/end = 250000/ chrom = 5 N4 7428 NT_004858S5.3/contig_orient = 7428 1 23642 30777 −20 −5 complement/start = 999278/end= 1248551 N4 7428 NT_005795S3.3 /contig_orient = I 7428 122036 129165−20 −5 forward/start = 405779/end = 655779/ chrom = 3 N4 7428NT_025140S3.3 /contig_orient = 7428 1 174979 182103 −20 −5 none/start =49919/ end = 649836/ chrom = 19 N4 7428 NT_009334S8.3 /contig_orient =7428 1 116705 123823 −20 −5 complement/start = 1574760/end = 1824759 N47428 NT_004406S4.3 /contig_orient = 1 7428 103675 110860 −20 −5forward/start = 797371/end = 985557/ chrom = 1 N4 7428 NT_011192S4.3/contig_orient = 7428 1 17828 25313 −20 −5 forward/start = 750004/end =949429/ chrom = 19 N4 7428 NT_007592S14.3 /contig_orient = 1 7428 141741150230 −20 −5 forward/start = 3276099/end = 3526099/ chrom = 6 N4 7428NT_011512S23.3 /contig_orient = 7428 38 93282 100383 −20 −5forward/start = 5505084/end = 5755084 N4 7428 NT_019638S1.3/contig_orient = 7427 1 179318 187384 −20 −5 none/start = 1/ end =250001/ chrom = 19 N4 7428 NT_022411S1.3 /contig_orient = 7024 1 140614147637 −20 −5 none/start = 1/ end = 163648/ chrom = 3 N4 7428NT_005632S1.3 /contig_orient = 7428 7428 48116 55040 −20 −5complement/start = 1/end = 214350/ chrom = 3 N4 7428 NT_022504S1.3/contig_orient = 7428 1 176629 183705 −20 −5 forward/start = 1/end =271641/ chrom = 3 N4 7428 NT_023397S2.3 /contig_orient = 1 7428 2528933146 −20 −5 complement/start = 250002/end = 455242 N4 7428NT_011520S13.5 /contig_orient = 1 7425 146733 154470 −20 −5forward/start = 3068083/end = 3318083 N4 742 NT_019483S4.3/contig_orient = 7428 1 13951 21321 −20 −5 forward/start = 750003/end =1000003/ chrom = 8 N4 742 NT_019483S2.3 /contig_orient = 7428 1 131375138737 −20 −5 forward/start = 250001/end = 500001/ chrom = 8 N4 742NT_024033S5.3 /contig_orient = 1 5981 41571 47546 −20 −5 forward/start =1000005/end = 1250005 N4 742 NT_023628S1.3 /contig_orient = 1772 7428 15656 −20 −5 complement/start1/end = 151365/chrom = 7 N4 742NT_023323S1.2 /contig_orient = 6704 1 1 6758 −20 −5 none/start = 1/ end= 103061/ chrom = 5 Homo

REFERENCES The Contents of which are Hereby Incorporated in Full byReference

-   1 Mayer et al. (1999) Nat. Genet. 21 (3), 257-258 (1999)-   2 Farrell (1998) RNA Methodologies (Academic Press; ISBN    0-12-249695-7).-   3 Yang et al. (1999) Proc Natl Acad Sci USA 96(23):13404-8-   4 Robbins et al. (1997) Clin Lab Sci 10(5):265-71.-   5 Ylikoski et al. (1999) Clin Chem 45(9):1397-407-   6 Ylikoski et al. (2001) Biotechniques 30:832-840-   7 Shirahata & Pegg (1986) J. Biol. Chem. 261(29):13833-7.-   8 Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual. NY,    Cold Spring Harbor Laboratory-   9 Short protocols in molecular biology (4th edition, 1999) Ausubel    et al. eds. ISBN 0-471-32938-X.-   10 U.S. Pat. No. 5,707,829-   11 Current Protocols in Molecular Biology (F. M. Ausubel et al.,    eds., 1987) Supplement 30.-   12 EP-B-0509612-   13 EP-B-0505012-   14 Berkhout et al. (1999) J. Virol. 73:2365-2375.-   15 Löwer et al. (1995) J. Virol. 69:141-149.-   16 Magin et al (1999) J. Virol. 73:9496-9507.-   17 Magin-Lachmann (2001) J. Virol. 75(21):10359-71.-   18 Hashido et al. (1992) Biochem. Biophys. Res. Comm. 187:1241-1248.-   19 Geysen et al. (1984) PNAS USA 81:3998-4002.-   20 Carter (1994) Methods Mol Biol 36:207-23.-   21 Jameson, B A et al., 1988, CABIOS 4(1):181-186.-   22 Raddrizzani & Hammer (2000) Brief Bioinform 1(2):179-89.-   23 De Lana et al., (1999) J Immunol 163:1725-29.-   24 Brusic et al. (1998) Bioinformatics 14(2):121-30-   25 Meister et al. (1995) Vaccine 13(6):581-91.-   26 Roberts et al. (1996) AIDS Res Hum Retroviruses 12(7):593-610.-   27 Maksyutov & Zagrebelnaya (1993) Comput Appl Biosci 9(3):291-7.-   28 Feller & de la Cruz (1991) Nature 349(6311):720-1.-   29 Hopp (1993) Peptide Research 6:183-190.-   30 Welling et al. (1985) FEBS Lett. 188:215-218.-   31 Davenport et al. (1995) Immunogenetics 42:392-297.-   32 Smith and Waterman, Adv. Appl. Math. (1981) 2: 482-489.-   33 Go et al, Int. J. Peptide Protein Res. (1980) 15:211-   34 Querol et al., Prot. Eng. (1996) 9:265-   35 Olsen and Thomsen, J. Gen. Microbiol. (1991) 137:579-   36 Clarke et al., Biochemistry (1993) 32:4322-   37 Wakarchuk et al., Protein Eng. (1994) 7:1379-   38 Toma et al., Biochemistry (1991) 30:97-   39 Haezerbrouck et al., Protein Eng. (1993) 6:643-   40 Masul et al., Appl. Env. Microbiol. (1994) 60:3579-   41 U.S. Pat. No. 4,959,314-   42 Breedveld (2000) Lancet 355(9205):735-740.-   43 Gorman & Clark (1990) Semin. Immunol. 2:457-466-   44 Jones et al., Nature 321:522-525 (1986)-   45 Morrison et al., Proc. Natl. Acad. Sci, USA, 81:6851-6855 (1984)-   46 Morrison and Oi, Adv. Immunol., 44:65-92 (1988)-   47 Verhoeyer et al., Science 239:1534-1536 (1988)-   48 Padlan, Molec. Immun. 28:489-498 (1991)-   49 Padlan, Molec. Immunol. 31(3):169-217 (1994).-   50 Kettleborough, C. A. et al., Protein Eng. 4(7):773-83 (1991).-   51 Chothia et al., J. Mol. Biol. 196:901-917 (1987)-   52 Kabat et al., U.S. Dept. of Health and Human Services NIH    Publication No. 91-3242 (1991)-   53 U.S. Pat. No. 5,530,101.-   54 U.S. Pat. No. 5,585,089.-   55 WO 98/24893-   56 WO 91/10741-   57 WO 96/30498-   58 WO 94/02602-   59 U.S. Pat. No. 5,939,598.-   60 WO 96/33735-   61 WO 93/14778-   62 Findeis et al., Trends Biotechnol. (1993) 11:202-   63 Chiou et al. (1994) Gene Therapeutics: Methods And Applications    Of Direct Gene Transfer. ed. Wolff-   64 Wu et al., J. Biol. Chem. (1988) 263:621-   65 Wu et al., J. Biol. Chem. (1994) 269:542-   66 Zenke et al., Proc. Natl. Acad. Sci. (USA) (1990) 87:3655-   67 Wu et al., J. Biol. Chem. (1991) 266:338-   68 Jolly, Cancer Gene Therapy (1994) 1:51-   69 Kimura, Human Gene Therapy (1994) 5:845-   70 Connelly, Human Gene Therapy (1995) 1:185-   71 Kaplitt, Nature Genetics (1994) 6:148-   72 WO 90/07936-   73 WO 94/03622-   74 WO 93/25698-   75 WO 93/25234-   76 U.S. Pat. No. 5,219,740-   77 WO 93/11230-   78 WO 93/10218-   79 U.S. Pat. No. 4,777,127-   80 GB Patent No. 2,200,651-   81 EP-A-0 345 242-   82 WO 91/02805-   83 WO 94/12649-   84 WO 93/03769-   85 WO 93/19191-   86 WO 94/28938-   87 WO 95/11984-   88 WO 95/00655-   89 Curiel, Hum. Gene Ther. (1992) 3:147-   90 Wu, J. Biol. Chem. (1989) 264:16985-   91 U.S. Pat. No. 5,814,482-   92 WO 95/07994-   93 WO 96/17072-   94 WO 95/30763-   95 WO 97/42338-   96 WO 90/11092-   97 U.S. Pat. No. 5,580,859-   98 U.S. Pat. No. 5,422,120-   99 WO 95/13796-   100 WO 94/23697-   101 WO 91/14445-   102 EP 0524968-   103 Philip, Mol. Cell. Biol. (1994) 14:2411-   104 Woffendin, Proc. Natl. Acad. Sci. (1994) 91:11581-   105 U.S. Pat. No. 5,206,152-   106 WO 92/11033-   107 U.S. Pat. No. 5,149,655-   108 U.S. Pat. No. 5,206,152-   109 WO 92/11033-   110 WO90/14837-   111 Vaccine Design—the subunit and adjuvant approach (1995) ed.    Powell & Newman-   112 WO00/07621-   113 GB-2220221-   114 EP-A-0689454-   115 EP-A-0835318-   116 EP-A-0735898-   117 EP-A-0761231-   118 WO99/52549-   119 WO01/21207-   120 WO01/21152-   121 WO00/62800-   122 WO00/23105-   123 WO99/11241-   124 WO98/57659-   125 WO93/13202.-   126 McSharry (1999) Antiviral Res 43(1):1-21.-   127 Kuhelj et al. (2001) J Biol Chem 276(20):16674-82.-   128 Schommer et al. (1996) J Gen Virol 77:375-379.-   129 Magin et al. (2000) Virology 274:11-16.-   130 Boese et al. (2001) FEBS Lett 493(2-3):117-21.-   131 Larsson, E., et al., Current Topics in Microbiology and    Immunology 148:115 (1989)-   132 Mariani-Costantini, et al., J. Virol. 63:4982 (1989) and Shih,    et al., Virology 182:495 (1991)-   133 Tönjes et al. (1996) J. AIDS Hum. Retrovir. 13(Suppl    1):S261-S267.-   134 Barbulescu et al., Curr. Biol. 9:861 (1999)-   135 Ono, et al., J. Virol. 58:937 (1986)-   136 Löwer et al., Proc. Natl. Acad. Sci. USA 90:4480 (1993)-   137 Ono et al., (1986) J. Virol. 60:589-   138 Boller, et al., Virol. 196:349 (1993)-   139 Yang et al., Proc. Natl. Acad. Sci. USA 96:13404 (1999)-   140 Mueller-Lantzsch et al., AIDS Research and Human Retroviruses    9:343-350 (1993)-   141 Herbst et al., Amer. J. Pathol. 149:1727 (1996)-   142 U.S. Pat. No. 5,858,723-   143 Löwer et al., Proc. Natl. Acad. Sci. USA 93:5177 (1996)-   144 Löwer et al, Virology 192:501 (1993)-   145 Genbank accession number AB047240-   146 Andersson et al. (1999) J. Gen. Virol. 80:255-260.-   147 Zsíros et al. (1998) J. Gen. Virol. 79:61-70.-   148 Tönjes et al. (1999) J. Virol. 73:9187-9195.-   149 Johnston et al. (2001) Ann Neurol 50(4):434-42.-   150 Medstrand et al. (1998) J Virol 72(12):9782-7.-   151 U.S. Pat. No. 5,010,175-   152 International patent application WO 91/17823.-   153 U.S. Pat. No. 4,816,567.-   154 Merrifeld, J. Am. Chem. Soc. 85:2149, 1963-   155 Caprino and Han, J. Org. Chem. 37:3404, 1972-   156 Milstein and Kohler, Nature 256:495-497, 1975-   157 Gulfre and Milstein, Methods in Enzymology: Immunochemical    Techniques 73:1-46-   158 Langone and Ballads eds., Academic Press, 1981-   159 Altschul et al. Nucleic Acids Res. (1997) 25:3389-3402-   160 Brutlag et al. Comp. Chem. (1993) 17:203-   161 Schena et al. (1996) Proc Natl Acad Sci USA. 93(20):10614-9-   162 Schena et al. (1995) Science 270(5235):467-70-   163 Shalon et al. (1996) Genome Res. 6(7):639-45-   164 U.S. Pat. No. 5,807,522-   165 European patent application 0799897-   166 WO 97/29212-   167 WO 97/27317-   168 European patent application 0785280-   169 WO 97/02357-   170 U.S. Pat. No. 5,593,839-   171 U.S. Pat. No. 5,578,832-   172 European patent application 0728520-   173 U.S. Pat. No. 5,599,695-   174 European patent application 0721016.-   175 U.S. Pat. No. 5,556,752-   176 WO 95/22058-   177 U.S. Pat. No. 5,631,734-   178 Pappalarado et al., Sem. Radiation Oncol. (1998) 8:217-   179 Ramsay Nature Biotechnol. (1998) 16:40-   180 U.S. Pat. No. 5,134,854-   181 U.S. Pat. No. 5,445,934-   182 WO 95/35505-   183 U.S. Pat. No. 5,631,734-   184 U.S. Pat. No. 5,800,992-   185 WO92/02526.-   186 U.S. Pat. No. 5,124,246.-   187 Mullis et al., Meth. Enzymol. (1987) 155:335-   188 U.S. Pat. No. 4,683,195-   189 U.S. Pat. No. 4,683,202-   190 Saiki et al. (1985) Science 239:487-   191 Hanahan et al. Cell 100:57-70 (2000)-   192 Weissman S M Mol. Biol. Med. 4(3), 133-143 (1987-   193 Patanjali, et al. Proc. Natl. Acad. Sci. USA 88 (1991)-   194 Simone et al. Am J. Pathol. 156(2):445-52 (2000)-   195 Clayerie (1996) Meth. Enzymol. 266:212-227.-   196 Automated DNA Sequencing and Analysis Techniques Adams et al.,    eds., Chap. 36, p. 267 Academic Press, San Diego, 1994-   197 Clayerie et al. Comput. Chem. (1993) 17:191-   198 Altschul et. al, J. Mol. Biol., 215:403-410, 1990-   199 Pearson & Lipman, PNAS, 85:2444, 1988-   200 Luo et al. (1999) Nature Med 5:117-122-   201 Higgins & Sharp CABIOS 5; 151-153 (1989)-   202 Delli Bovi et al. (1986, Cancer Res. 46:6333-6338)-   203 Cesarone, C. et al., Anal Biochem 100:188-197 (1979)-   204 Southern, E. M., J. Mol. Biol. 98:503-517 (1975)-   205 Feinberg, A. P., et al., 1983, Anal. Biochem. 132:6-13-   206 Wright and Manos (1990, in “PCR Protocols”, Innis et al., eds.,    Academic Press, pp. 153-158)-   207 Keown et al., Methods in Enzymology 185:527-537 (1990)-   208 Marks, et al., Brit. J. Urol. 75:225 (1995)-   209 Skea, et al., J. Immunol. 151:3557 (1993)-   210 Mather, et al., J. Nucl. Med. 31:692 (1990)-   211 Zhang et al., Nucl. Med. Biol. 19:607 (1992)

1. An isolated polynucleotide comprising: (a) a nucleotide sequence ofor corresponding to an RNA expression product of a human endogenousMMTV-like subgroup 2 (HML-2) retrovirus, (b) a fragment of at least 7nucleotides of (a), (c) a nucleotide sequence having at least 75%identity to (a), or (d) the complement of (a), (b), or (c), wherein saidHML-2 retrovirus is HERV-K(CH).
 2. The isolated polynucleotide of claim1, wherein said RNA expression product comprises a Gag or Pol encodingsequence of HERV-K(CH).
 3. The isolated polynucleotide of claim 2,wherein said RNA expression product comprises a nucleotide sequencecorresponding to a DNA sequence selected from the group consisting ofSEQ ID NOS: 14-26.
 4. A method for the treatment or diagnosis ofprostate cancer, testicular cancer, multiple sclerosis orinsulin-dependent diabetes mellitus, the method comprising administeringto a patient, or contacting a biological sample of the patient with, anisolated polynucleotide of claim
 1. 5. The method of claim 4, for thetreatment of prostate cancer.
 6. An isolated polynucleotide havingformula 5′-A-B-C-3′, wherein: -A- is a nucleotide sequence consisting ofa nucleotides; -C- is a nucleotide sequence consisting of c nucleotides;and -B- is a nucleotide sequence consisting of either (a) a fragment ofat least 7 nucleotides of or corresponding to an RNA expression productof a human endogenous MMTV-like subgroup 2 (HML-2) retrovirus, or (b)the complement of a fragment (a), wherein (i) said polynucleotide isneither (a) nor (b), (ii) a+c≧1, and (iii) said HML-2 retrovirus isHERV-K(CH).
 7. The isolated polynucleotide of claim 6, wherein said RNAexpression product comprises a Gag or Pol encoding sequence ofHERV-K(CH).
 8. The isolated polynucleotide of claim 7, wherein said RNAexpression product comprises a nucleotide sequence corresponding to aDNA sequence selected from the group consisting of SEQ ID NOS: 14-26. 9.The isolated polynucleotide of claim 1 or claim 6, comprising adetectable label.
 10. A kit comprising primers for amplifying a templatesequence contained within an isolated polynucleotide of claim 1, saidkit comprising a first primer and a second primer, wherein said firstprimer is substantially complementary to said template sequence and saidsecond primer is substantially complementary to a complement of saidtemplate sequence, wherein parts of said primers that havecomplementarity define the termini of said template sequence to beamplified.
 11. An isolated polypeptide comprising: (a) an amino acidsequence encoded by a nucleotide sequence of an RNA expression productof a human endogenous MMTV-like subgroup 2 (HML-2) retrovirus, (b) afragment of at least 7 amino acids of (a), or (c) an amino acid sequencehaving at least 75% identity to (a), wherein said HML-2 retrovirus isHERV-K(CH).
 12. The isolated polypeptide of claim 11, wherein (a) is anamino acid selected from the group consisting of SEQ ID NOS: 46-57. 13.The isolated polypeptide of claim 11, wherein said RNA expressionproduct comprises a Gag or Pol encoding sequence of HERV-K(CH).
 14. Theisolated polypeptide of claim 13, wherein said RNA expression productcomprises a nucleotide sequence corresponding to a DNA sequence selectedfrom the group consisting of SEQ ID NOS: 14-26.
 15. An isolatedpolypeptide having a formula NH₂-A-B-C-COOH, wherein -A- is an aminoacid sequence consisting of a amino acids; -C- is an amino acid sequenceconsisting of c amino acids; and -B- is a fragment of at least 5 aminoacids of an amino acid sequence encoded by a nucleotide sequence of anRNA expression product of a human endogenous MMTV-like subgroup 2(HML-2) retrovirus, wherein (i) said polypeptide is not a fragment of anamino acid sequence encoded by a nucleotide sequence of said RNAexpression product, (ii) a+c≧1, and (iii) wherein said HML-2 retrovirusis HERV-K(CH).
 16. The isolated polypeptide of claim 15, wherein -B- isa fragment of at least 5 amino acids of an amino acid sequence selectedfrom the group consisting of SEQ ID NOS: 46-57.
 17. The isolatedpolypeptide of claim 15, wherein said RNA expression product comprises aGag or Pol encoding sequence of HERV-K(CH).
 18. The isolated polypeptideof claim 17, wherein said RNA expression product comprises a nucleotidesequence corresponding to a DNA sequence selected from the groupconsisting of SEQ ID NOS: 14-26.
 19. A polypeptide of claim 11 or claim15, wherein said polypeptide is attached to a solid support.
 20. Apolypeptide of claim 11 or claim 15, wherein said polypeptide comprisesa detectable label.
 21. An antibody for use in the diagnosis of prostatecancer, said antibody having binding affinity for the polypeptide ofclaim 11 or claim
 15. 22. The antibody of claim 21, wherein saidantibody is a monoclonal antibody.
 23. The antibody of claim 21, whereinsaid antibody is attached to a solid support.
 24. A pharmaceuticalcomposition comprising: (a) a polynucleotide of claim 1 or claim 6, apolypeptide of claim 11 or claim 17, or an antibody of claim 23, and (b)a pharmaceutically acceptable carrier.
 25. An immunogenic compositioncomprising: (a) a polynucleotide of claim 1 or claim 6 or a polypeptideof claim 11 or claim 17, and (b) a pharmaceutically acceptable carrier.26. The immunogenic composition of claim 25, further comprising anadjuvant.
 27. The immunogenic composition of claim 25, wherein saidadjuvant comprises an oil-in-water emulsion or an aluminum salt.
 28. Amethod of raising an immune response in a patient, the method comprisingadministering an immunogenic dose of the immunogenic composition ofclaim 25 to said patient.
 29. A composition comprising: (a) a prostatecell, and (b) a polynucleotide of claim 1 or claim 6, a polypeptide ofclaim 11 or claim 15, or an antibody of claim 21, and (c) apharmaceutically acceptable carrier.